Skip to content

Spatial Data in the Cloud – Part 3, Data Warehouses

In part 3 of our Spatial Data in the Cloud blog series, we'll take a look at the different data warehouses that support spatial data types and explore different use cases to demonstrate how FME can enable you to extract and load data using the spatial SQL functions.

In the previous posts, we looked at cloud-native relational databases and NoSQL databases. Next up, data warehouses.

Data warehouses (Online Analytical Processing (OLAP) databases) are interactive data analysis tools for large datasets. They enable you to store large volumes of data cheaply and provide an interface where you can run fast complex queries across the data. They excel at analyzing data and are typically poor at writing data with uploads often done in bulk. 

 

Spatial Support

With an increasing accumulation of spatial data across all business systems, geospatial analysis has become a key requirement for any modern analytics platform. Until recently, none of the cloud data warehouses supported geometry. That has now changed with Google Big Query, Snowflake, and AWS Redshift adding support for spatial data. This means you can start thinking about how you can leverage location to help improve your decision making!

 

Database Geospatial Support FME Support
AWS Athena

Documentation

Type of Support: Native

Spatial Types: Geometry 

Supported Types: Points, Lines, Polygons, Multipoint, Multilines, Multipolygons and Geometry Collections 

Spatial Functions: Over 30 functions

Run queries on S3 datasets. Results are also stored on S3
AWS Redshift

Documentation

Type of Support: Native

Spatial Types: Geometry and 2D support only

Supported Types: Points, Lines, Polygons, Multilines and Multipolygons

Spatial Functions: Over 40 functions

Read, write and query
Synapse Analytics Type of Support: No support for spatial types
Google Big Query

Documentation

Type of Support: Native

Spatial Types: Geography

Supported Types: Points, Lines, Polygons.  

Spatial Functions: Within, Distance, Contains, Touches, Covers

Read, write and query
Snowflake Type of support: Native

Spatial types: Geography

Supported types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection, Feature, FeatureCollection 

Spatial Functions: Over 30 functions

Read, write and query

As you can see, almost all listed data warehouses support spatial data types natively which is a huge deal. There are, however, limits with each database only supporting either geography or geometry types. Since the data you are pulling into the data warehouse is likely from many different systems, you will need to clean and reproject the data before loading. FME can help with this.

 

Use Cases

The main use-case for data warehousing is to unify disparate data sources into a single location so complex queries can be run to ask questions of the data. Even large complex queries run against huge datasets will run in seconds—and relatively cheaply—this means you can afford trial and error to get the right query. Since the interface is in SQL, it is easy for analysts and GIS people who are familiar with SQL to construct queries. Compare this to something like MapReduce which is much more complicated to use and queries can take days.

Here are a few common scenarios:

  • Load subsets of data from NoSQL and relational databases into the data warehouse. All of the data warehouses have limitations on the supported spatial types and coordinate systems, so you can use FME to reproject data before loading it in.
  • Extract subsets of data by providing areas of interest, either from existing datasets or by deriving new filters within FME. These spatial filters provide a fast way to reduce data volumes if you want to do further analysis or visualization on a specific portion of the data.
  • Let the data warehouse do the heavy lifting of complex analysis, then store the results in a relational database and serve the results to clients from there.
  • Gain spatial insight directly from your Data Lake Stores. Leave data in your cloud storage and then using a data warehouse service, link to the external data source (Snowflake, Athena, Redshift, and BigQuery all support this) and then query using a standard SQL interface. This means you don’t need to lift, shift or copy the data to query it.

 

Conclusion

You can use FME to help with all of the use-cases defined above. FME enables you to extract and load data from data warehouses using the spatial SQL functions. On loading, you can pull disparate data sources together so you can run custom analytics across all of your data silos.

If you want to have a play around with a Data Warehouse but don’t have a huge dataset, BigQuery has a public dataset (with a spatial component) which contains all of the trips taxis have taken in NYC since 2009.

Safe product icons
Reach out and get started with FME today

Real change is just a platform away.