The right spatial data, in the right context, can be tremendously valuable to decision makers. A common challenge in getting value from data is that it’s stored all over the place – in different applications, databases, and formats, and with different data models and coordinate systems. Here, I’ll summarize a few common challenges moving data into, between, and out of spatial databases.
Why move data in or out of spatial databases?
- … to create a data warehouse. In this situation, data is maintained in a number of separate databases (or perhaps files), imported into a central database where it can be analyzed, and then perhaps published in various forms.
- … to migrate to a central data store. Similar to the previous scenario, data is consolidated from different sources into a common database. The data is then maintained and edited in the central store.
- … to exchange data between different organizations and users.
- … and many other reasons.
As more relevant, current and high quality data is integrated together and accessible to the tools that manipulate and analyze it, the greater opportunity there is to extract value. Let’s look at a few potential hurdles on the way to this goal:
There are many reasons you might need to transform data between formats, including integrating data managed by different GIS, CAD, 3D, or database systems, or converting data to or from interchange (e.g. GML) or visualization (e.g. KML) formats.
Different Schemas or Data Models
Translating between data models is considerably more interesting and difficult than translating between formats. This affects both the spatial and non-spatial components of the data. Following are a few examples of transformations between different spatial representations:
- (x,y) coordinates stored in numeric columns point geometry stored in a geometry column
- area feature attributes associated with an inside point attributes associated with area features directly
- line segments as separate geometries line segments as distance offsets along common linear features
- complex shapes described parametrically (e.g. lat/long, orientation, radius, etc. to describe a region of airspace) complex shapes described geometrically (e.g. as polygons or 3D volumes)
Non-spatial challenges include levels of normalization (some formats are highly normalized, breaking data into hundreds of tables or layers, and others are not), as well as differences in attribute domains (e.g., a “pond” might mean different things in different datasets).
Different Coordinate Systems
There is often a need to reproject data into coordinate systems appropriate for different uses. A common example is for spatial data infrastructures where coordinate values may be reprojected from local or national coordinate systems to global or continental ones (e.g. lat/long ETRS89 in Europe).
The need for Data Validation
Care must be taken when aggregating or transforming data to ensure that data quality is maintained or improved. In some cases, quality issues can be resolved automatically, and in other cases, suspect features can be flagged for manual correction. Following are a few examples of quality and consistency properties one might verify:
- individual geometry is valid (e.g. using the OGC simple and/or valid predicates, checking for self-intersections, etc.)
- area is fully covered, without spikes or slivers between adjacent geometries, and overall topology is consistent
- relationships between different layers are consistent
- required attributes are present and within their valid domains
At Safe, our passion is data. Our goal is to give you access to your data when, where, and how you need it, and we regularly address challenges such as these and others to make that possible.