[Summary: Rough ideas around tools and conventions for open data re-use….]
The context
My dissertation research was focussed on the re-users of open government data: people who had their hands directly on a dataset. However, any model of open data leading to change also has to have end-users: the people using things that have been created through the re-use of open government data.
And, as Dave Briggs rightly points out, (I’ve touched upon before, and Tony Hirst has also explored), it’s important for end-users not only that data is open, but that the re-use of the data is open and accountable as well, particularly in civic contexts. In practice, whilst many re-uses of open government data are not deliberately closed (e.g. the source code may well be accessible on GitHub or Google Code), for the non-technical end-user it can be pretty hard to work out whether the data has been manipulated, cleaned and combined with other data, and when the dataset was last updated. As Chris Thorpe notes, as as I found in putting together diagrams of many open government data uses, a lot of use of open government data involves caching bulk datasets, and the risk of stale data hanging around, particularly after rapid-prototyping of apps, can be come a big problem. Couple these issues with some identified in four thoughts on improving open government data and the sketch of an idea around opening up the data workflow, as well as the dataset, starts to emerge.
The problems
Opening up the workflow of data-use should address a range of problems:
- Letting end-users know the version of data in the application/visualisation/report they are viewing and if there is more up-to-date data available.
- Making it possible to end-users find data re-use from dataset listings (solving some of the track-back problem). (E.g. if you’re looking at the COINS listing on data.gov.uk you should be able to easily find and choose between different interfaces available onto that data).
- Allowing end-users to easily see how data has been processed / manipulated / queried in a given application/visualisation/report (e.g. seeing some sort of clear ‘workflow’ of steps the data has been through).
- Giving re-users an incentive to be transparent about their workflow of data use.
- Giving re-users a really easy way to articulate how they have used data, and what steps they have been through in doing so.
- Helping notify re-users when datasets are changed and their application/visualisation/report might need updating.
- Providing hooks to share ideas / code / annotations and best-practice ideas on particular parts of open data workflows (e.g. suggestions about how to best normalise particular data; brief information on checking a given dataset for statistical significance; pointers to existing APIs and converted copies of a given dataset).
Whilst Hadley Beeman is heading up a fantastic project to work on some of the technical infrastructure around open data use – I see that there could be a complementary project here focussed on light-weight tools, conventions and design principles to help ensure end-users and re-users can get the most out of even messy and non-integrated open data, by being transparent and open about how it’s done.
A sketch of the solution*
The diagram to the right (click for full size version) sketches out some of my thinking about a possible tool that might help here – and also hopefully points to some of the practices that would be good to encourage…
Using a yahoo-pipes sort of workflow metaphor, a tool could allow re-users of data to quickly specify (drag-and-drop or simple syntax) ‘input datasets'; ‘operations on those datasets’ and ‘outputs’ from the process. Each part of the workflow could be annotated and made ‘social’. This tool would not be about actually processing any data: just representing how it has been processed and used.
To work it would need to provide value to both re-user of data, and end-user. In the sketch I’ve suggested that there could be value to the re-user in getting notifications of when datasets are updated, and being able to see how others have used a particular dataset by looking at workflow steps commonly connected with particular data; and for the end-user (and re-user) and simple widget / graphic to embed / include in the product of open data use giving details of it’s transparent and accountable workflow (and possibly changing to show when newer data may be available / allowing easy access to other things built with a given dataset) could be provided.
Social rather than systems and systematic
I’m not suggesting that a tool is the solution to bringing about transparent and accountable workflows for open government data. But it might be a useful catalyst for creating social conventions about end-user transparent data re-use.
Where next?
I’m not sure. I’ve put together some draft proposals to get some funding to do more research around open data re-use, but being research proposals, they will take a long time (at least in open data terms) to work through the system. So:
- Who might be interested in helping develop these ideas further? And even prototyping something?
- Could any elements of the ideas above be fitted into tools like CKAN and Data.gov.uk as they redevelop over the coming months / years (e.g. improving the app directory on data.gov.uk, or encouraging more annotation of datasets on CKAN)
Do drop any ideas in the comments below or feel free to drop me a line.
*I was planning to tidy up these ideas a bit more before blogging, but with a mountain of other work (and the necessity of taking a holiday) looming, I thought best to blog now and tidy later if there is interest in taking any of these ideas forward in any way…
Hi – some very useful ideas. Worth noting that the LGA Group, CIPFA, SOCITM, LEGSB now have a transparency working group and we’re looking at lightweight guidance and metadata that can help with that. We’re keen that open data is as useful as possible, both for the citizens and local public services themselves – so it’s absolutely vital for comparision purposes for people to understand what, when and how data was compiled.
For more info on what we’re doing the Local Open Data Community is a good place to start (CoP sign-in required).
Enjoy your hols!
Fab. Thanks Ingrid. Will keep an eye on what’s developing. That fits really well at that start point in the workflow (getting an understanding of the original dataset). Is any work going on around how that meta-data is presented to end-users?
One part of the open data workflow type ideas I think will be important to crack is the presentation of all this meta-data in a non-overwhelming and very accessible way…
What kind of end users? How you access the data and what developers choose to show you will determine how info is presented. But to be honest…I don’t know exactly.
Hi Tim,
Really useful diagram. I’ve passed this to our technical lead on Open Kent http://www.slideshare.net/localinnovation/open-kent who I think this will really help. In speaking to local entrepreneurs/developers, many start from the innovative/useful service they could help developer with us, but they stay away from open data because of the expertise needed to make sense of and re-use in the first place, which is why we’ve gone for a user rather than data-led approach. Showing them the steps of how you can open up data like you’ve done is really useful. I’ll add our six steps to open data when I get into the office so you can compare.
Note to self: The data provenance information here http://danpaulsmith.com/gov/orgvis/# goes quite a way towards a good practice model for showing the provenance of the information.