[Update 5th May: I thought five was too few. See #6 from Southampton]
Data.gov.uk now contains a list of over 3200 different datasets. One of those, is the data.gov.uk directory of datasets itself, allowing anyone to take and display information about UK open government data. Over the last few days I’ve come across a number of places that have done just that, so here’s a quick round-up of some of the different places you can find listings of Data.gov.uk datasets (plus one I built to explore the data for myself):
#1 Data.gov.uk
The authoritative source of data, currently listing 3290 records and allowing records to be explored by search, by public body or by common tags. The directory runs on the CKAN software. The main CKAN site doesn’t yet list of all of the data.gov.uk datasets, but a number are slowly making it into CKAN.
Data.gov.uk also includes an automatically created Wiki Page for each dataset where annotations and details of uses of the data can be added (although this does not appear to be widely used at present).
#2 Guardian World Data Search
The Guardian website contains a search across ‘World Government Data‘ that includes records from data.gov.uk. The site currently lists 5294 United Kingdom open government data resources, which includes at least a lot of both Data.gov.uk and London Datastore datasets. However, even all of data.gov.uk and London Datastore doesn’t add up to 5294 so there must be some other datasets in there too…
#3 Info Chimps
InfoChimps describe their mission as ‘…to increase the world’s access to structured data’, and the current InfoChimps website brings together large collections of both open and commercial data. The Data.gov.uk collection on InfoChimps currently lists 2485 datasets, and provides each with an information page allowing comments to be added to it.
#4 CKAN Data Dump
Not exactly browse-friendly, but you can find an archive of the entries on the data.gov.uk CKAN platform at http://ckan.net/dump/ in JSON and CSV. Interestingly the ‘daily’ dump appears to have frozen around February 2010, and the latest files are from April, which I assume is when the last change to Data.gov.uk was made, although even this only holds 3240 records as against data.gov.uk’s current 3290.
#5 My mash-up
I’ve been building quite a few mash-ups with Exhibit lately – and until the release of GridWorks it remains one of the best tools out there for quick-and-dirty exploration of data. So I’ve put together a quick Exhibit display of Data.gov.uk datasets.
The meta-data for datasets is far from uniform, with many different strings for ‘Update Frequency’ and few overlapping tags – so I’ve processed the latest JSON dump to try and make fields like ‘Update Frequency’ and ‘Geographical Coverage’ more uniform for better faceted browsing. I’ve also taken inspiration from this post on the OKF blog by Francis Iriving and extracted details of the domains which datasets are served up from.
#6 PSI Catalogues Aggregator
Built by a team at Southampton University supervised by Prof Nigel Shadbolt and including a range of tools to explore an RDF representation of datasets from Data.gov.uk (amongst other sites) and to explore the tags and themes included in datasets and when they first appeared in the data stores. The PSI Catalogues Aggregator listings for Data.gov.uk appear to contain 2879 datasets, suggesting again that a slightly different dataset, or some other filtering is in place… The PSI Catalogues Aggregator includes data from opsi.gov.uk alongside data.gov.uk although I’m not entirely sure at this stage where the opsi.gov.uk data is coming from… (PSI Catalogues aggregator found via Open Data Blog and added 5th May)
#7 World CKAN (Update 6th May)
Another addition, this time ‘World CKAN‘, an experimental RDF version of CKAN with a SPARQL endpoint that you can query. For example, run this query on the endpoint to get a list of all the datasets:
Issues for the study: Cached or live?
Building my own mash-up and exploring the directories from other people raised a number of interesting issues (not least some methodological issues about the role of this sort of ‘action research’ in the Open Data Impacts study). No site seems to return exactly the same number of datasets as another – suggesting each operates from a different copy of the data. In my case, I initially planned to build the mash-up to update from the CKAN daily dump of data.gov.uk – but (a) it appears this isn’t updated at the movement and (b) I suspect automated updates may break the fairly manual data-cleaning I have in place at the moment, so I’m using a static cached copy of the data.
I’ve added some questions to the Open Data Impacts survey that will hopefully be out in the next week or so to ask about caching of data to explore this issue more…
Thanks to @epsiplatform for a pointer to their list of data catalogues at http://www.epsiplatform.eu/psi_data_catalogues