“Hacking” Open Data
I recently had the pleasure of attending the Vancouver Open Data Hackathon on behalf of Safe Software. For those of you from outside Vancouver, the city recently passed an initiative requiring the city to “…freely share with citizens, businesses and other jurisdictions the greatest amount of data possible while respecting privacy and security concerns”. This has led to the creation of the Vancouver Open Data Catalogue, which recently reached version 2.0 and has been receiving a lot of attention. The catalogue is still growing, with new datasets being added all the time. Lucky for those of us in the geo business, the majority of these datasets are spatial.
The hackathon was an opportunity to bring together hackers (in the traditional sense), people with ideas for mash-ups (such as David Eaves), and representatives from the City of Vancouver’s GIS group to discuss applications that could be created with the City’s data. VanTrash, a service that sends you reminders on trash day rather than navigating the City’s notoriously complicated paper schedules, is probably the best known Vancouver mash-up made to date, and the idea came out of an earlier hackathon. It was interesting to be able to talk to people who want to work with open data, but generally didn’t have much of a spatial background, and also good to see quite a bit of interest in GIS from nerds not of the geo type. There were a lot of questions about data formats and why some of the data was not in Lat Lon. It seemed like perhaps the most popular type of data to work with was transit data, perhaps because our local transit agency recently opened their schedule and route data.
The UK Government has also recently introduced an open data portal, but unlike the Vancouver portal, the UK portal has a query interface in SPARQL. It is quite interesting to see the two different approaches to distributing data. Many of the people I’ve spoken with who provide free data are not interested in providing an advanced query tool. Only time will tell if people prefer to use the query tool rather than simply downloading the full datasets in the way that most data catalogues work. For those not familiar with SPARQL, its complexity may encourage them to simply download the full datasets to create their own applications, and for most mash-ups, creators are probably better off hosting a copy of the data themselves anyway. While neither method of distributing data is necessarily more correct, it will be interesting to see which method picks up more organizations and users.