NetCDF Meets GIS: Geospatial Architecture for Scientific Data
Applying geospatial system architecture principles to gridded scientific data services results in easier, more flexible access for end-users. It’s complicated – but FME handles the heavy lifting.
“Weather forecast for tonight: dark.”
George Carlin’s not-so-subtle dig at the accuracy of weather forecasting sums up how a lot of us feel at being caught unprepared in a downpour when the forecast called for sun. But if you take a moment to consider what goes into creating that forecast (and how much better they are today than when we were kids!), the science behind it is some pretty impressive work. And the volume and complexity of the data is daunting.
Ryan M. Kelly is intimately familiar with this data. Already a specialist in numerical weather prediction and ocean prediction modeling, he was exploring the integration of GIS principles and techniques to work with model output, which eventually drew him back towards professional studies to pursue a GIS degree at Johns Hopkins University. It was there that he first encountered FME, and started thinking about how these complex data sources could be made more accessible.
What is multidimensional gridded data?
NetCDF is one of many scientific data formats (SDFs) that are designed to hold a lot of complex data in a platform-independent manner. Picture a cube, if you will, full of points that have x, y, and z coordinates, attach attributes to those points – and then watch those attributes change over time.
Common scenarios for this type of data are found in meteorology and oceanography, where temperatures, pressures, composition and a multitude of other factors are constantly in flux. The datasets can range from relatively small to petabytes in size, and it can be analyzed in countless ways – from taking horizontal or vertical slices through it to generating streamlined 2D and 3D visualizations.
THREDDS Data Servers
As often happens with specialized, technical datasets, SDFs have had a tendency to live in silos. Though the infrastructure behind (and access to) this information is evolving, Ryan saw an opportunity to really improve the situation by applying geospatial system architecture to it, and leapfrog ahead in data usability and interoperability.
Much of this type of scientific data is now being served up through THREDDS (TDS) web servers. The technology is the product of Unidata, a federally funded community program at UCAR in the United States focused on developing tools and software that allow for better distribution of geoscience data within education and research entities.
A TDS provides a catalog platform, metadata publishing, and data access services for scientific datasets. This data includes multidimensional formats like NetCDF, HDF, and GRIB – and a TDS can handle massive quantities of data. Ryan’s goal was to integrate THREDDS services into an FME-powered workflow, which would openly distribute the data for wider usage and easier access.
System Architecture
Ryan’s system architecture diagram will look familiar to FME users – a service on the input side, FME and a spatial (cloud!) database in the middle, and a variety of formats pushed out to various clients as output. The unusual aspect of this strategy is applying spatial queries to source multidimensional data that isn’t typically manipulated this way.
By approaching it in this manner, an end user can efficiently subset the source data – for example, taking a horizontal slice through a NetCDF dataset using a geographical bounding polygon, and pulling out a grid of air temperature and relative humidity values at a specific height, at a specific time.
One of the strengths of this approach is the application of SQL queries to a data type that isn’t usually handled that way. In addition to defining horizontal and vertical clip boundaries with spatial queries, the user could generate pixel statistics to send on for further analysis and visualization.
And of course, they then have the option to transform the data any number of ways that FME can, including writing out to the formats shown here (GeoJSON, JSON, GeoTIFF, or WMS) plus any number of other appropriate FME-supported formats, with the opportunity to enrich it with other geographic datasets.
What’s Next
With THREDDS servers currently hosting data from atmospheric, climate, and ocean simulations all over the globe (and new servers being added at an increasing rate), this is one “big data” source that is growing rapidly.
“With an architecture built on an FME framework that has already demonstrated direct connectivity to a TDS, it opens up a new pathway for professionals that may need environmental or remotely sensed data for active projects,” says Ryan. “This type of approach for multidimensional formats is unique in that it can be applied to any sizable data stack, whether it is several gigabytes of data for a small application or one that is terabytes or petabytes in scale for the purpose of serving a persistent archive.”
If you’re interested in experimenting with this sort of data yourself, a catalog of THREDDS servers with data from many countries is available.
More on Weather –
On the blog: Aviation Weather: MeteoSwiss Enhances SIGMET & AIRMET Alerts with Maps
From the 2014 FME UC: Real Time Lightning Alerts from The Weather Network
Are you a student who is interested in applying FME to your research project? Check out the FME Grant Program!