Guide to creating, using, and maintaining open data portals
What are open data portals? Vast collections of data exist online that are full of possibilities and available for anyone to use. The value that open data provides goes beyond community participation and improved products. Having data readily available regarding taxes, housing prices, crime, navigation, energy efficiency, and more, opens up a whole new level of global transparency and accountability.
Governments, businesses, research organizations, and others have embraced the open data movement and have seen enormous benefits.
Open data is free, accessible data that anyone can use for any purpose—and public interest in it is exploding. With it, we have access to information about the places, businesses, and organizations we care about. It means transparency. It enables collaboration, innovation, and scientific and technological advancement.
Table of Contents
Who Produces Open Data?
All Levels of Government
Governments are currently the main providers of open data and have been the forefront of the open data movement. Data is provided at all levels of government (e.g. city, state and federal) in the UK, USA, Canada, and various other countries.
The reason?
- A drive to increase transparency and accountability
- The benefits citizens get from innovative applications and services using open data is immense
Many governments are now mandated (e.g. INSPIRE) to provide open data to their citizens, and a variety of other countries, states, provinces, and cities are going down the same path. We’re seeing the importance of this more and more, especially since COVID-19 has required everyone around the world to contribute and work together towards a common goal. Sharing data is a huge part of that process.
The Tri-County Health Department has done a fantastic job of not only managing their COVID-19 data, but also publishing it so that it’s accessible to anyone, anywhere.
As we move towards smart cities and gain the ability to measure and control almost any physical object, data volumes will increase exponentially. This means that the importance of keeping a low cost of entry for developers is critical.
Non-Governmental Organizations
NGOs have always paid attention to the democratization of data since they work to provide better public services or planning development projects. Relief efforts can also benefit significantly from open data.
With efforts like crowdsourcing and geomapping, NGOs and other non-profit groups can be helped through the power of data. We saw this with the Nepal earthquake in 2015 where 4,000 mappers had mapped out 13,199 miles of road and 110,681 buildings within 48 hours of the natural disaster. These maps allowed aid groups to make rescue plans and find the safest and fastest routes to aid those needing help.
Many NGOs are also starting to produce open data themselves. For example, the Hewlett Foundation and the Gates Foundation share where they spent their aid money for increased transparency. Hundreds of organizations have also published data about their operations and spending via the International Aid Transparency Initiative.
Academic Institutions
Academic institutions are also opening up their data. Fuelled by the success of sharing data on Alzheimer’s, there is a movement beginning for more transparent research practices. As there are many complexities around sharing research data, progress is slow. Despite this, more and more funding agencies and academic publishers are supporting and even mandating data sharing.
Openly accessible research data is also at the heart of the EU’s €80 billion Horizon 2020 program, which gives further hope of dramatic progress in the near future.
One of the greatest successes of open data was The Human Genome Project. In this feat, the human genomic sequence information was made public almost immediately. As a result, a genome can now be mapped in a few hours, costing less than $1000.
Private Companies
Even though governments are the main providers of open data, there is a significant proportion of corporations who are starting to realize that producing open datasets can improve their bottom line. Benjamin Herzberg of the World Bank Institute calls this new frontier the Open Private Sector.
Why Does Open Data Matter?
The debate over whether or not this data should be made available to the public is still ongoing. Traditionally, many governments have viewed the sale of data as a revenue stream. Other governments view the infrastructure costs of freely opening data as prohibitive. Lastly, the raw data may not be in a form that is easily shared.
As data lovers, we believe that data should never be locked inside applications or formats. Data should be free to use whenever, wherever, and however it’s needed. By opening data, knowledge can be shared and new innovations can be made. With open data, citizens have the option to examine data and answer questions they may have, researchers and journalists can gather and analyze data to tell stronger stories, and developers can use data to build applications.
Creating Your Own Open Data Portal
If you want to make your data public, there are few things to consider before creating your own open data portal.
1. Only Share Good Quality Data
Making decisions based on bad data could have extreme ramifications. Before sharing a dataset to the public all aspects should be checked for completeness, correctness, consistency, and compliance. This includes validating geometry, attributes, standards compliance, format-specific issues like XML / JSON structure, and more.
2. Offer Format Choices
Data is nothing if no one can read it.
By definition, open data should be easy for the public to use. Offer a choice with respect to format and remember that data should be both machine readable and human readable. Here are some formats we recommend for open data portals:
CSV A tabular format that’s easily read by humans. Excel is also a good one to offer for these reasons. |
Shapefile A widely used spatial data format. It’s consistently the most popular GIS format in our usage stats. |
XML It’s machine readable and offers the user a lot of power and flexibility for tabular data. |
KML It’s instantly viewable in a web environment and is the format of choice for Google Maps and Google Earth. |
JSON Like XML, it’s machine readable and flexible, plus it’s a language commonly used by APIs to transfer data over the web. |
GeoJSON It’s flexible, machine readable, and is a language commonly used by APIs to transfer data over the web, but also stores spatial data. |
Other useful formats to consider:
- GML – it’s a widely used OGC format
- AutoCAD DXF/DWG – for CAD users
- PDF – it looks nice and is easily shareable. Note this should be a supplementary format and not your central focus, as PDF is less useful for people intending to work with and analyze the data on their own.
3. Update Datasets Frequently
Open data or not, using old data is problematic when trying to make informed decisions. So, it’s important that data in an open data portal is updated regularly.
A great way to set this up is to connect your open data platform to your master database. This way your data will be integrated directly instead of being duplicated in two locations, avoiding issues that may occur when updates are needed in the future.
To do this, start with FME. Synchronize your portal with your database using transformers like the ChangeDetector that can watch for updated fields in your database. Then, use Automations in FME Server to ensure your portal is updated as soon as any changes behind the scenes take place.
4. Provide Projection Options for Spatial Data
When it comes to spatial data, users should be able to choose their projection. Local (e.g. State Plane or British National Grid) and global projections should be provided. For global projections, we recommend:
- WGS 84 Lat/Lng (EPSG: 4326)
- Spherical Mercator (EPSG: 3857)
When using FME to manage open data portals, you can provide coordinate system choices by making a published parameter. There are a few ways to go about selecting a coordinate system. One option is to use the Reprojector transformer which uses the CS-Map reprojection engine, but others are available like PROJ, Gtrans, Esri).
5. Choose a Open Data Portal Delivery Solution
Now that your data is ready to be shared, you’ll need a platform to provide the data.
Each of the solutions presented here offers its own set of strengths for data publishers. The world is always creating more options and each one offers something new. So, it’s important to always do your own research to find the best solution for you and your team.
ArcGIS Open Data
(Licensing: Commercial, Delivery Model: SaaS)
Configure your own branded open data site with ArcGIS Server or ArcGIS Online. ArcGIS Open Data will be of particular interest if you currently use Esri within your organization.
Esri has made it very easy to create and configure an open data site, allowing you to focus on your strategy, policy, and adoption rather than technical and operational concerns.
Publishing & Management | Visualization Features | Geospatial Features |
|
|
|
Examples: Open Data DC, City of Burnaby
The City of Langley uses ArcGIS Online along with FME to supply open data to their citizens in a streamlined way. This allowed them to maintain their open data via the ArcGIS Online REST API. Learn more about how they did this by viewing their presentation.
CKAN
(Licensing: Open Source, Deployment Model: Self-hosted with SaaS offerings based on the CKAN technology)
CKAN is a leading open source data portal with over 300 open source data management extensions. It is a powerful platform best suited for large organizations, as it is relatively complex to set up and maintain.
Publishing & Management | Visualization Features | Geospatial Features |
|
|
|
Examples: US Government Open Data, UK Open Data, Government of Canada Open Data
The City of Surrey is a great example of a government that utilizes both CKAN and FME to manage their open data portal. By using both these technologies, they’re able to supply datasets that can be downloaded in any format and any projection.
Socrata
(Licensing: Commercial, Deployment Model: SaaS)
Publishing & Management | Visualization Features | Geospatial Features |
|
|
|
Examples: NYC OpenData, Washington State
Socrata themselves used FME to help with the Police Foundation in Washington D.C. They needed to automate the ingress and centralization of various police data sources into an open data portal for the Task Force on 21st Century Policing. Learn more about how Socrata used FME for the task.
Amazon Web Services
(Licensing: Commercial, Deployment Model: PaaS)
Flexible, pay-as-you-go pricing on both AWS and FME makes it a great option if you are conscious of your spending, but still want rich functionality.
Notes on architecture:
- Vector data: Stored in PostGIS/SQL Server Spatial RDS Database. FME can connect to almost any data store, so you can leave your data where it is
- Raster/LiDAR data: Stored in AWS S3 with the footprint stored in the vector database for quick querying
The Arkansas GIS Office open data portal is hosted in the cloud using Amazon and supported by FME. With 7+ terabytes of data to migrate, they started with FME and continue to use it as an ongoing automation tool. Learn more about how Arkansas Geographic Information Systems Office did it.
Free Hosting
(Licensing: Commercial Free, Deployment Model: SaaS)
If you are looking for data visualization or analysis then look to the previous solutions, but if all you want is a simple file catalog service, free hosting may be the way to go.
Free hosting with sites like DataHub.io, FTP, Google Drive, and GitHub is a good place to start. The cloud file storage solutions can be used to store and serve large volumes of data. It is simple to upload the data, and a simple web interface can be built on top of the storage system to provide further context.
If collaboration with users is important, look at GitHub. Several people have successfully used it to host open data, and by uploading GeoJSON, you can even visualize the data on a map.
In all cases, FME can be used to sync the storage services with the master database to ensure the databases are up to date.
6. Automate the Process
Now that you have all the pieces of your data portal put together, you’re going to want to keep it functioning. You could do it manually, but with the amount of data being collected, processed, and stored these days, performing these tasks manually will get overwhelming. Fast.
Using FME is a great way to get things done the way you want them, when you want them. Build workflows that connect to your master database, standardize and validate all sorts of datasets, connect them to the platform of your choice using pre-built connectors or APIs, and then use Automations to ensure when one update happens elsewhere that it’s reflected in your portal.
Better yet, use FME for custom open data access like map based data distribution. Your FME workflows can function behind the scenes so that when a user selects a specific area of interest, the dataset is clipped using their exact shape. This saves them the effort of transforming data themselves, providing an even better, full-functioning service. This is the kind of automation that goes above and beyond and can really WOW your users.
The City of Surrey uses CKAN and FME to automate how they supply data via their open data portal. To learn more, watch their presentation.
Unleash the Data!
We’re going to be seeing a lot more open data in the world due to its overwhelming popularity. Plus, there are no excuses when it comes to the technical side of things. With cloud and automation tools like FME, it’s straightforward and cheap (and fun!) to create an open data portal.
Of course, there will be demand for higher quality data, not just more of it. Open data must be easy to find, use, and collaborate on. We also expect to see open data become normalized so it’s easier to compare cities globally.
With free access to data, citizens will be able to engage with the community at large and innovate based on their own interests. While open data users can take advantage of this, even the citizens with no knowledge of open data can use it. It might be to research the crime rate before purchasing a new home, to find out where their tax dollars go, or to find out how much members of parliament make.
By keeping open data alive, new trends will start to develop on a global level. The possibilities are endless.