Data and Trust: Raw data now? Or only after rigorous review?

Screen Shot 2012-12-13 at 19.10.17I’ve just come across a report from Science Wise on an ‘Open Data dialogue’ they held earlier this year. The dialogue brought together 40 demographically diverse members of the UK public to discuss how they felt about the application of open data policies to research data. Whilst the dialogue was centred on research data rather than government data, it appears it also touched on public datasets such as health, crime and education statistics, and so the findings have a lot of relevance for the open government data movement as well as the research field. The report from the dialogue is available as a PDF here.

One of the key findings that jumped out at me was amongst a list of ’8 key principles [the the public identified] that could be used to promote more effective open data policies’, and stated the view that ‘Data should be checked for inaccuracies before being made open’ (a number of the other points, such as ‘Raw data should include  full details explaining what the data relates to, how it was collected, who collected it and how formatted’ were also interesting in providing further basis for the Five Stars of Open Data Engagement, but I’ve written about that plenty already). The interesting thing about the idea that data should be checked before being made open is that it runs counter to the call for ‘Raw Data Now’ commonly heard in open data advocacy, where the argument is made that putting data out will allow the errors to be spotted and fixed.

The reason, the Science Wise report explain, that the public in the dialogue were reluctant to accept this was to do with trust (though it should be noted this is something Science Wise were explicitly interested in exploring in their dialogues, so was a considered, but not unprompted response from participants). With lots of data out there, subject to different interpretations, and potentially inaccurate, trust in the data, and the work based upon it may be eroded. Although building trust is one of the reasons often given for openness, the idea that openness can in fact undermine trust is not a new one: see for example Grimmelikhuijsen on Linking transparency, knowledge and citizen trust in government, and Archon Fung speaking at the Open Data Research Network’s Boston workshop. What some of this past work on trust and openness does do, however, is suggest this is an area open to empirical research to test the claims in either direction.

For example, studies could be constructed to ask:

  • Does putting out ‘Raw Data Now’ actually lead to errors being (a) spotted; (b) fixed in the source data; and (c) corrections propagated so the impact of the errors is minimised in work based on the data?
  • Is research or policy based on open data, where that data has been used by third parties, more or less trusted that comparable research or policy without the underlying data being open? What are the confounding factors in either direction?

The call for ‘raw data now’ may be as much strategic (an attempt to head off objections to releasing data) as anything else, but it will take work to understand when its a strategy with a short-term gain and longer term risks, or when it makes sense to pursue.

Perhaps to end with an image:

WhatDoWeWant

Leave a Reply

Open Data in Developing Countries


The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy