What supports the sustainable re-use of open data?

There’s a general perception that, for most open datasets, it’s still pretty tricky to do really useful things with them. Whether it’s the need for data to be better cleaned and annotated, the need for shared source code for manipulating complex datasets, or the challenge of working out what tools can help make sense of particular dataset, many projects and ideas are emerging to address different perceived challenges to working with open data, particularly open government data. Discussions around the proposed Public Data Corporation are also raising questions about what infrastructures and supports for public data re-use this new institution should provide.

I’ve recently been working with Dr George Kuk (open source expert at Nottingham Uni) on a paper around ‘Open Data Complementarities’, bringing together his studies of Ordnance Survey and Rewired State hack-days, with the cases and survey data from my MSc research, to identify the different activities, artefacts and actors that make for sustainable open data re-use.

I’ll try and get a draft of the paper up here soon,You can find a draft of the paper here, but one element I wanted to draw out in particular as a contribution to discussions on making open data use easier, and more sustainable, is the framework of ‘interlocking sequences’ of open data use we identified.


As the diagram above shows, uses of open government data tend to involve a ‘stack’ of processes, starting with improving the quality of the data (1); providing cleaned data dumps for download, or APIs for working with the data (2); complementing the data with meta-data, linkages* or standardisation of keys (3); generating mash-ups or apps from the data by writing code or configuring tools (4); and potentially integrating the results into some existing system (5). At each step of the way both material and social artefacts are created: cleaner data; APIs; source code; social capital and communities. The persistence, accessibility and openness of these around any dataset impacts upon the potential for sustainable and successful re-use.

So what does that mean in practice? Well, it helps highlight again why data alone is not enough. It can help us reflect on which datasets are likely to self-generate supporting environments of artefacts complementary to and supportive of open data re-use; and which datasets might not find these environments emerge naturally, but where investment in the re-use environment might be needed. It also helps in assessing the value-chain of particular datasets. Measuring the final outcomes from releasing a dataset is notoriously difficult, but tracking the emergence and persistence of some of the artefacts around a dataset can be easier in a number of cases.

We’re thinking about ways to further develop and test the model above – and hoping that getting a paper out into journals focussed on information systems more broadly may help generate some discussion and debate on how open data can drive sustainable public service innovation: but also very open to all comments and reflections from practitioners working on the supply and use of open data on how we can refine the model further.


*Note that we say ‘linkable’ data rather than linked data; linkable data can simple have common keys and remain in formats such as CSV. Linked data involves specific formats, models and conventions.


