Four thoughts on improving open government data

Hadley Beeman has been doing some fantastic work developing conversations and ideas to meet many of the challenges that current open government data users are encountering.

You can read her latest update on the project over here. What I meant to be a short comment turned into a blog-length set of reflections, so – reposted here to log for future reference…

Data is not just for developers

(A small mantra I’m going to end up repeating lots over the next few months just to try and rebalance things a bit).

There are lots of cases where actually people are happy with an Excel spreadsheet with some numbers in. In fact, I suspect they are happier with the Excel spreadsheet with numbers in, than they are with the flashy website with a new query interface they have to learn and which is sitting between them and government data having an interesting psychological impact on how much they can trust the numbers etc.

For example, a charity applying for funding might want the fairly ‘pretty’ and entirely un-programatically accessible DfE spreadsheet in order to browse for educational attainment statistics across london boroughs and to decide what performance targets to set for themselves in a funding bid.

So in any of the diagrams here there is a need for a direct arrow from the original raw data to re-users – and for thought about how they can get the little bits of extra data they need to work out and understand a spreadsheet. Even if people end up finding data through an API-enabled web-interface searching across datasets, they may end up wanting to go back to the original file to be confident they’ve understood the context of the data and it’s creation.

The social infrastructure around data-use matters as much as the technical infrastructure

There are always some datasets that are tricky to work with – and aspects of working with data that cannot be 100% automated. Around existing use of data.gov.uk data the collaboration people have gone through to make sense of data has not only been about finding what categories mean – but plays a role in them understanding all sorts of other things about a dataset //and// finding people who can help them / share code / share insights etc.

In building on a technical infrastructure to help wider groups of people make use of data, we also need to think about how the social infrastructure is widened.

It’s not just formatting that’s the challenge

Hadley’s post mentioned that “Differences in formatting and, most importantly, undefined codes and values mean that much of the published data can’t be analysed or compared to any other data.”

However, often the problem of comparison is not the file-format or schema of the data – but the very nature of the datasets. E.g. one collected by school-year; one collected by calendar-year.

Either comparison needs normalisation (technical solution: include some sort of standard normalisation modules in here) or meta-data needs to help people understand what is and isn’t possible in terms of comparisons.

Developers like bulk data

In all the cases I looked at for ODI even when data was available through APIs people liked to grab the full dataset, cache it locally and work with it there.

Any architectural plans would need to be sure they work for people grabbing bulk data.

2 Comments

  1. IngridK

    Data is not just for developers…public sector organisations – and obviously I’m particularly thinking of local government – can use this information, too – and the more open and ‘mashable’ it is, the more use local gov and local partnerships can get out of it and the less time, money and effort they’ll have to spend finding, cleaning, combining and re-using the data.

  2. Hugh Barnard

    One interesting area [for me, I live on an estate in east London] is the provision by local government of soft real-time date for fuel and electricity use, for example.

    Our ad-hoc group wants this to do some ‘greening’ and it represents a new set of [not very hard] challenges for the borough. Currently a lot of this data is capture in very proprietary formats to suit ‘closed’ building management systems.

    We’re also interested in soft real-time for some of the maintenance tasks, an approach that would save them a lot of time on the phone and duplication. Again, easy to do but hard to change the culture of secrecy amongst the officials.

2 Trackbacks/Pingbacks

  1. Données publiques : l’infrastructure sociale est aussi importante que l’infrastructure technique « Montpellier numérique

    […] Comme il l’explique dans un billet du blog Open Data Impacts, l’infrastructure sociale autour du projet britannique d’ouverture des données est au moins aussi important que l’infrastructure technologique. […]

  2. Données publiques : l’infrastructure sociale est aussi importante que l’infrastructure technique | Montpellier Territoire Numérique

    […] Comme il l’explique dans un billet du blog Open Data Impacts, l’infrastructure sociale autour du projet britannique d’ouverture des données est au moins aussi important que l’infrastructure technologique. […]

Leave a Reply

Open Data in Developing Countries


The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy

RSS Recent Publications

  • An error has occurred, which probably means the feed is down. Try again later.