Data Automation with FME Flow Cloud Data

Why cloud integration tools can be used on any cloud platform

Ada Lee

October 22, 2020•11 min

When choosing cloud integration tools, many people believe that they must choose a tool that resides in the same cloud platform as their data. However, that is not the case....

When choosing cloud integration tools, many people believe that they must choose a tool that resides in the same cloud platform as their data. However, that is not the case. Many organizations adopt a multi-cloud strategy and move data between clouds. This is no different for cloud integration.

So, when it comes to FME Cloud, the Safe Software hosted version of FME Server, let us be clear:

No matter what cloud your data resides in, FME Cloud can still be the best option for deploying FME Server.

With FME Cloud, you get the full functionality of FME Server, but we do all the server maintenance, backups, and ensure uptime for you. While FME Cloud infrastructure is hosted on AWS, these behind-the-scenes details are less important than you may think. What really matters when connecting a cloud integration tool to other clouds for data consumption are two key considerations: performance and security.

How to Choose Cloud Integration Tools

Performance

When executing integration tasks involving large amounts of data, the time it takes to transfer the data over the network and run the translation are both critical in determining the overall runtime of the integration task.

For FME Server, whether you deploy and host FME Server yourself or leverage FME Cloud, the same codebase will be used. That leaves network latency as a key variable when deciding if using FME Cloud and moving your data over the internet will be fast enough compared to deploying FME Server right next to your data and managing it yourself.

Let’s run some benchmarking tests and explore if network latency is an issue you need to be concerned with.

Benchmarking

Below we benchmark a simple process that involves downloading data from three cloud data storage services: Amazon S3, Azure Storage and Google Cloud Storage. The tests are broken down into two sections:

Varying the cloud provider but keeping the geographic region static.
Varying the geographic region but keeping the cloud provider static.

Cloud Platform

How does changing the cloud platform where the data resides but keeping the geographic location similar impact run time?

I created an FME Workspace with FME Desktop that downloads a 48 GB file from the cloud storage service and then re-uploads it back to the cloud storage service. All of the data centres are located less than 1000km from each other. For example, in the case of Azure:

A 48 GB CSV file resides on Azure Blob Storage.
An FME Workspace is run on FME Server that downloads the 48 GB file. Once that has finished downloading, the Workspace writes the CSV file back to the same storage container on Azure.
This was run twice. See the table below for average translation time.

Location of the 48 GB CSV File	The Runtime of the Workspace FME Server deployed on FME Cloud in the AWS us-west-2 region, Oregon
AWS S3 (us-west-2, Oregon)	17 minutes, 18 seconds
Azure Blob Storage (Premium) (West US 2, Washington)	13 minutes, 27 seconds
Google Cloud Storage (us-west1 Oregon)	18 minutes, 3 seconds

In each test, almost 100 GB of data was moved between FME Cloud and the cloud storage service in less than 20 minutes! Even more surprising, two of the three storage services reside on a different cloud platform than FME Cloud, and yet the speed of the test is similar for all of them. What this tells us is that the public internet infrastructure between cloud providers in the same locale is fast.

In my testing, the biggest consideration is actually the network bandwidth limits of the instance as this can range wildly depending on the instance you use. For example, for the FME Cloud instance we used a Standard instance (AWS m5.large) and the network bandwidth limit is 10 Gbps, but for many instance types, the value is much lower which dramatically impacts the download/upload speeds.

Geographic Location

So moving data between cloud platforms in the same geographic region doesn’t impact the download speed substantially. How does changing the geographic location of the data but keeping the cloud platform constant impact run time?

In this test we’ll use exactly the same workflow, except this time we will move the data all over the world. We’ll keep the FME Server running on FME Cloud in us-west2, but we’ll read the data located on Azure Blob Storage from Eastern USA, Europe and Australia.

Azure Storage Region	FME Server deployed on FME Cloud (Deployed in AWS us-west-2, Oregon)
Azure – Western USA	13 minutes 27 seconds
Azure – Eastern USA	43 minutes 45 seconds
Azure – Europe	1 hour 26 minutes 44 seconds
Azure – Australia	1 hour 28 minutes 31 seconds

As you can see, as the distance between the data and FME Server increases, the time it takes to read and write the data over the network increases substantially. Having done similar tests in AWS and Google, this is not unique to Azure and if you used either of these clouds the results would be similar.

Security

When a Workspace running on FME Cloud connects to any service or database, the data will travel over the public internet so it is imperative that the data is secure. There are many security requirements that must be accounted for when working with APIs and databases, the two key ones I want to focus on are networking and identity and access management.

Cloud Services

Almost all of the services provided by AWS, Azure, and Google Cloud can be accessed via an API. These APIs can be configured to be accessible to the public internet or private to your cloud network. If you are using FME Cloud, the API needs to be publicly accessible, so let’s have a look at the considerations for working with a public API.

* Note: Beyond the cloud platforms, almost all APIs are public (e.g. Salesforce, Jira, Open Data Portals etc) and since FME supports the key authentication protocols (e.g. OAuth, token) we can connect to nearly all of them.

Networking

For any API — public or not— when data is sent, it should always be encrypted in transit via HTTPS. If the data is being sent over HTTP, then you should be aware attackers could eavesdrop or manipulate data. Fortunately, encryption is standard now so every API you use should be HTTPS. HTTPS is also enabled by default on FME Cloud and we manage the certificate for you.

Another feature to look out for is enabling firewall rules to limit access to a specific IP range. This helps protect networks from unauthorized access. Many of the services on the cloud platforms (e.g. Azure Storage and AWS S3) support this. On FME Cloud you can assign a static IP to the FME Server instance so we can also support this workflow.

Identity and access management

The next consideration is limiting who has access to the API via a good authentication and authorization gateway. All of the APIs on the cloud platforms provide a way for a user to authenticate with the API securely via credentials. This means you can connect securely from FME Server no matter if it is deployed on FME Cloud or on-premises.

So what are the limitations of connecting to an API from outside of the cloud provider infrastructure? Well, they are advanced security features that larger enterprises utilize:

Private endpoints are not supported – Private endpoints limit API access to the customer’s virtual network (e.g. VPC on AWS) and the data will not leave the cloud provider’s infrastructure. If your organization adopts this configuration, you would need to deploy FME Server within the network rather than using FME Cloud.
You cannot delegate authentication – All cloud providers have native identity and access management service (e.g. Azure Active Directory or AWS IAM). This allows end-users who are in a trusted environment (e.g. vNet on Azure and VPC on AWS) to interact with services without providing credentials. This is supported for FME Server if you host yourself but not for FME Cloud.

In summary, even with these considerations interacting with a public API is extremely secure and you still have granular control over who has access. If you have very strict security and privacy requirements then you may need to deploy FME Server into your own cloud account rather than using FME Cloud.

Databases

FME supports all of the main cloud databases. Interacting with these databases over the public internet is secure and supported with FME, let’s have a look at the considerations.

Networking

As with APIs, encrypting the connection between FME and the database is critical when transmitting data over the public internet. SSL/TLS is supported by all major databases. FME supports communicating over SSL, but you will need to do some minor configuration which varies slightly by database.

When making a database public, you will need to define firewall rules to limit which IPs can access the database. FME Cloud supports static IPs and also allows you to define the inbound network firewall rules which allows you to control the protocols, ports, and source IP ranges that are allowed to reach the FME Server instance.

Identity and access management

For authentication, there are many options and it depends on the database. For interacting with a database over the public network there are two main options available:

Password: Usernames and encrypted passwords can be defined by the database administrator.
Two-way TLS Certificates: Some databases support certificates being used for authentication. A client certificate is generated that contains basic information about the client and the server uses it to verify the client on connection. It can be used to authenticate automated systems like FME Server.

As with the cloud services, it is easy to open up the database to the public internet so FME Cloud can access it, however, depending on the sensitivity of the data this might not be feasible. If this is the case then deploying FME Server yourself on the cloud platform in the same network as your database is the solution. If you are able to open the data up to the public internet then the technology exists both on the database and FME to securely connect and transmit data.

Conclusion

If you are trying to decide between hosting FME Server yourself or Safe hosting and managing FME Server for you on FME Cloud, focus on the benefits and limitations that FME Cloud offers rather than concerning yourself with where it exists in the cloud.

With regards to performance, don’t worry about moving data between cloud providers so long as you locate both the data and FME Server in the same part of the world.

On the security side of things, as long as your organization doesn’t have strict security rules that require integrating FME directly with the cloud provider’s native identity management service, then the security features FME Cloud provides should be sufficient. If not, no problem. You can deploy FME Server directly into your cloud account.

Learn FME in 90 minutes. Get started today!

Free FME Accelerator

Real change is just a platform away.

FME is ready to put your data to work and transform your business today. Are you?

Get Started With FME

Why cloud integration tools can be used on any cloud platform

How to Choose Cloud Integration Tools