Open Data, Land, Gender

[Summary: very rough and speculative notes in response to a land coalition online dialogue]

The land coalition are hosting a online dialogue until 20th Feb looking at “using online platforms to increase access to open data and share best practices of monitoring women’s land rights”. It’s an interesting topic for a dialogue particularly given one of the most widely cited cases used to highlight potential downsides of open data relates to the digitisation of land records and their exploitation to the detriment of poor landholders. However, as platforms like the LandMatrix (aggregating together land investment reports from research and advocacy groups across the world), and Open Development Cambodia demonstrate, open data is also being used by citizens to monitor land rights issues.
In this post I share a few quick thoughts on the broad theme of open data, land and gender.

Open data and land

The dialogue asks about how online platforms are contributing to the opening of land data. There are three broad sources of data I can see:

Official data – where governments have well managed land ownership databases then as part of national open government data programmes citizens may be able to secure the ongoing publication of this data in open forms. In the United Kingdom we’ve recently seen the Land Registry place data online, detailing land sale transactions in CSV and linked data; and a publicly owned land is a commonly featured dataset on local open data portals in the UK. However, this data itself may be tricky to use directly, and intermediaries are needed to make it accessible. In Kirklees, the Who Owns My Neighbourhood presents an interesting approach to using official data, and combining it with social features for citizens to input local knowledge and news about publicly owned plots of land: making official land data more ‘social’.

Crowdsourced data – in many cases there may not be an official source for the data activists want, or there may be limited prospect of getting access to the official data. Here a range of ‘crowdsourcing’ approaches exist. The LandMatrix approach uses researchers, and works to verify reports before sharing them. There may be other approaches available that use tools like pybossa to crowdsource extraction of structured information from semi-structured documents, or to split analysis of records into micro-tasks. The Open Street Map platform may also be able to act as source of data, allowing tags to be applied to land. Tools like CrowdMap (based on the Ushahidi platform) make it possible to collate reports submitted on a range of platforms including phone, and to verify reports, although the challenge with any crowdmap project is recruiting people to submit data.

Inferred data – at one of the RHOK Hack Days I took part in at Southampton I was interested to hear about a groups project using satellite data to work out crop types on plots of land. I suspect there are ways this data could be used to detect changes in land use that might indicate also changes in ownership – and the conversion of land from multiple crops to large agribusiness.

Using land data

Having open data on land ownership and land rights is only one part of the story. As the Bhoomi case illustrates, the regulatory framework around the data matters: is a dataset taken as authoritative, or are documents or other customary practices able to override the descriptions held in data? Does the data model through which land ownership and rights are described capture the subtlety and nuance of land use practices (see Srinivasan’s field note for a discussion of the need to mash-up multiple schemas of data to get a view of complex land practices)? And what intermediaries are active to help citizens mobilise land records to secure their rights, rather than those records being only truly accessible to private actors with technical and financial capital?

In the ongoing Land Coalition dialogue I’m interested to learn more about the cases of how data on land rights is being mobilised to create change: whether at the level of global advocacy, where big numbers may matter most; or at the level of individual struggles over ownership, access and rights, where detailed, accurate and timely data on particular plots is likely to be most important.

Open data and women’s land rights

I will admit to knowing very little about the specific issues around women’s land rights. However, in making the connection between open data and women’s land rights I did want to briefly explore whether a focus on digital platforms and open data introduces any particular gender issues. For example, whilst statistics on mobile phone penetration in developing countries suggest widespread access to mobile devices, there is a significant gender gap in mobile ownership and access, with women much less likely to have control of a handset than men. Gender issues may also arise in relation to the culture and practices around open data.

In a recent First Monday article, Joseph Reagle suggests that the ‘free culture’ movement associated with open source software and open knowledge products like Wikipedia possess a gender gap that is potentially event greater than the very gender unequal general computing culture from which it arose. Reagle argues that the ideas of ‘openness’ current in these communities can be used to dismiss concerns about gender gaps, and paint them as an issue of choice, rather than highlighting the wider structural factors that lead to the massive underrepresentation of women in online free software and open knowledge construction. For example, Reagle points to the “double shift” of women’s time, and the ways in which the ‘free time’ used to contribute to creation of open culture, whether through evenings away from work, or hack-days and other events, is unequally distributed between women and men.

Does this critique carry across the open data? It is apparent that the open data field is far from gender equal – at least in terms of advocates for open data, and the creators of tools, platforms and analysis built upon data – although whether it is male dominated to the extent that other fields such as open source contribution are is yet to be measured. In part any gender imbalance may be attributed to the connections between the open data community and the open source and free culture communities, which are already have a significant gender imbalance. However, we should also be open to deeper issues of epistemology: whether the very notion of resolving questions of ownership or fact through datasets, rather than through processes of dialogue, is itself gendered. How far advocacy to open up datasets moves into advocacy for the primacy of data over other ways of knowing, and how data is used and interpreted, has a bearing on whether gendered systems of power are being reinforced or challenged.

An ongoing discussion…

The above remarks are just some first thoughts on the topic. The Land Portal dialogue is running for another week, and I’m looking forward to diving spending time looking at what others are saying to better understand how open data and land can connect in constructive and positive ways.

I hope we might also develop some lines of the gender discussion more in upcoming work of the Open Data in Developing Countries project.

Notes on open government data evaluation and assessment frameworks

The evaluation of open data initiatives has become an increasingly pressing concern for many. As open data initiatives have proliferated, there have been a number of attempts to develop assessment, monitoring and measurement frameworks that can inform policy, and that will support comparative assessment of different open data efforts, or that can guide the creation of new initiatives. In this post I look at a number of the frameworks that have been put forward, or are currently in development. This post is part of my thinking aloud in planning for some common research tools in the Exploring the Emerging Impacts of Open Data in Developing Countries project, and in putting together a methods section for my PhD.

My working notes for this post, with a short summary of each of the frameworks described can be found here.

What is being measured?

The frameworks I explored fall into three broad categories:

  • Readiness assessments – looking at whether the conditions exist for an open data initiative to be started or successful.This category includes the Web Foundation Open Government Data Feasibility Studies and World Bank Open Data Readiness Assessment.
  • Evaluating implementation – looking at whether existing initiatives, or organisations, meet some criteria for ‘good’ open data implementation.This was the largest group, including the Five Stars of Linked Open Data (Berners-Lee, 2010); The Open Data Census [LINK]; The Open Data Index (Farhan, D’Agostino, & Worthington, 2012); mOGD-I; MELODA (Garcia, 2011); The State of Open Data method (Braunschweig, Eberius, Thiele, & Lehner, 2012); the assessment of open budgetary data in Brazil (Craveiro, Santana, & Alburquerque, 2013); Grading Government’s Open Data Publication Practices (Harper, 2012); and the Data Openness Index and Government Data Openness Index (Murillo, 2012).
  • Impact assessment – none of the frameworks I looked at explicitly address impact (though there are a number of studies that have developed methods to try and quantify economic impacts of open data (Vickery, 2011)), but a few frameworks in development do seek to make connections between implementation and different kinds of potential open data impacts (Jetzek, Avital & Bjorn-andersen, 2012; Huber, 2012).

The frameworks I explored operate at a number of different levels. Readiness assessments tend to operate at the country level, although the World Bank suggest their Open Data Readiness Assessment can also be applied at sub-national levels.

Implementation assessments may target a variety of:

  • Individual datasets
  • Open data portals
  • Individual institutions
  • Open data initiatives
  • Whole countries

A number of frameworks generate aggregate assessments of initiatives, portals or institutions based on aggregating up numerical scores for the ‘openness’ of datasets belonging to that parent entity. For example, MELODA, and a recent implementation of the Five Stars of Open Data on assign scores to institutions based on an average of the scores assigned to their individually published datasets.

How does measurement take place?

There are a number of non-mutually exclusive approaches to measurement, including:

  • Survey of technical features – identifying a list of features that datasets or data portals should possess, and carrying out an automated, or manual, survey of whether these features are in place. These approaches are generally agnostic as to the subject of the data, but are interested in whether datasets are machine readable, openly licensed and well catalogued (Braunschweig et al., 2012; Garcia, 2011) and the 5 Stars of Linked Open Data.
  • Specific dataset checklist – these approaches determine a short list of particularly important datasets and ask about whether these are available, and then conduct a technical assessment of these particular datasets. The Open Data Index, and Open Data Census both adopt this approach.
  • Domain specific assessments – Harper’s grading of US departments dataset publication practices identifies ideal features of specific datasets, and evaluates them against these (Harper, 2012). For example, where a standard exists for representation of a particular kind of data, it would judge a department higher where it adopts this standard.
  • Added value features – The Open Data Index, and the proposed mOGD-I model include questions on whether applications have been built on top of data, or whether there are accompanying tools around datasets. The readiness assessments also consider the capacity of states to support and stimulate activities that might increase uptake and use of open data.
  • Features of the environment – the readiness assessments major on this, describing social, technical, legal, political, economic and organisational contexts for open data.
  • Expert surveys – most assessment frameworks draw to a degree on survey methods, even though some attempt to automate elements. In most cases a single informant is used.

Some frameworks look to generate a single number that can be used to rank the subject of analysis, as in the case of the Open Data Index, MELODA, or implementation of the 5-stars of open data model. Other frameworks present a multi-dimensional assessment of their subject, either omitting aggregation altogether, or providing aggregation along a number of dimensions such as legal, organisation, technical etc. 

What does all this mean for the ODDC project?

In the Exploring the Emerging Impacts of Open Government Data in Developing Countries research project there are a number of things we want to try and understand.

  1. How does the context that an open data initiative operates within affect the use of data in governance processes?
  2. How do the technical features of an open data initiative affect the use of data in governance processes?

The first question draws upon the sort of data that might feature in a readiness assessment. The second draws upon the sort of data gathered in an implementation assessment. Like Huber (2012), and Jetzek et. al. (2012) we are hypothesising that the way an open data initiative is implemented may be slanted towards particular kinds of data re-use and thus impacts. By trying to connect context, implementation and impacts, we will be looking to both draw upon, and inform the further development of, evaluation frameworks.

Within the project we need to be able to perform evaluation at two levels:

  • The macro level – as we build upon learning from the Web Index to refine methods of generating country-level indicators that can inform an assessment of the extent to which a country has capacity to benefit from open data, and the extent to which this is being realised.
  • The case level – as the individual qualitative cases in developing countries generate comparable descriptions of how open data has been used.

The development of the macro level framework will be an ongoing task over the next year, but with the individual cases kicking off very soon, there is some immediate work to be done to develop two resources: a simple contextual questionnaire for describing the environment in a country or city; and a dataset assessment tool that can be applied at the level of individual datasets, collections of datasets, or intermediary platforms.

Hopefully a further iteration of working through the frameworks listed in this post will inform the development of these. As I get started on this task I would welcome pointers to any resources I have missed.


Berners-Lee, T. (2010, July). Linked Data – Design Issues. Retrieved from

Braunschweig, K., Eberius, J., Thiele, M., & Lehner, W. (2012). The State of Open Data Limits of Current Open Data Platforms. WWW2012. Retrieved from

Craveiro, G. da S., Santana, M. T. De, & Alburquerque, J. P. de. (2013). Assessing Open Government Budgetary Data in Brazil. ICDS 2013.

Farhan, H., D’Agostino, D., & Worthington, H. (2012). Web Index 2012. Retrieved from

Garcia, A. A. (2011). Methodology for Releasing Free Data (MELODA) (pp. 1–15). Retrieved from

Harper, J. (2012). Grading the Government’s Data Publication Practices.

Huber, S. (2012). The fitness of OGD for the creation of public value. In P. Parycek, N. Edelmann, & M. Sachs (Eds.), CeDEM12 – Proceeding of the Conference for E-Democracy and Open Government. CeDEM.

Jetzek, T, Avital, M., & Bjorn-andersen, N. (2012). The Value of Open Government Data : A Strategic Analysis Framework. Orlando. Retrieved from

Murillo, M. J. (2012). Including all audiences in the government loop: From transparency to empowerment through open government data.

Vickery, G. (2011). Review of Recent Studies on PSI re-use and related market developments. PAris.


Exploring incentives for transparency in developing countries

[Summary: brief reflections on the dynamics of transparency in developing countries]

Doug Hadden of FreeBalance (developers of Public Financial Management software) has posed the question “What are the Incentives for Transparency in Developing Country Governments?“. Doug notes that many of their developing countries customers have been interested in implementing transparency portals such as, and transparency has been a major topic of conversation at their annual user group meeting.

My initial draft of a comment became rather long, so here are a few reflections in reply to that question by way of a blog post.

Framing the question

First, we need to identify whether a distinction between developed and developing countries has particular relevance to this question. There are three main areas where the distinction could be being drawn: degree of political freedoms and democracy; levels of corruption; and state capacity and effectiveness. Malesky et al. comment on the fact that we might expect the dynamics of transparency initiatives to be different in more authoritarian regimes. We might anticipate both that authoritarian governments have less incentive to pursue transparency, and that if transparency is pursued, it is less likely to be effective in changing policy and implementation outcomes, further undermining the case for it’s adoption. A similar incentive issue may exist for regimes with high levels of corruption. If political elites are seen to be corrupt, then it may be surprising to see those elites adopt and pursue transparency policies. Lastly, on the question of state effectiveness, it might be argued that it is surprising that a democratic state with limited capacity adopts transparency as a policy instrument over other available public sector reforms. In his chapter in Corruption and Democracy in Brazil Bruno Speck discusses the importance of empowered audit and oversight institutions to ensuring effective use of public finance. Transparency may be a means by which actors outside the state can put things on the agenda of empowered institutions, but without effective state mechanisms to enforce compliance with laws once problems are identified, it may look to be a flawed policy tool. All these distinctions (levels of freedom, corruption and state capacity) might have some degree of correlation with the development status of a country, though the line is not clear cut. Alexandru Grigorescu’s paper on international organisations and government transparency points to one further distinction worth noting: the higher levels of involvement of international organisations in developing countries.

Secondly, we need to identify what sort of transparency we are talking about. David Held suggests we need to distinguish four directions of transparency: upwards (hierarchical relationships; when the superior can see the actions of the subordinate), downwards (when the ruled can see the behaviour/results of their rulers; agencies can see behaviour up the management chain); outwards (when agents inside an organisation can see what is happening outside it); and inwards (when those outside can observe what is happening inside the organisation). Using these categories we can interrogate how a particular transparency initiative is functioning. For example, a transparency portal may be giving inwards and upwards transparency to government, but it may not only be giving new insights to citizens, it may also be allowing agencies who previously struggled to get hold of information due to bureaucratic blocks in mid-level agencies or departments, to more effectively access information they need to do their jobs. It is also important to answer the question ‘transparency of what?’. Transparency of outdated information, or information with little political salience is dramatically different from releasing up-to-date information on the most recent public spending, such as occurs through Brazil’s transparency portal.

With these distinctions in mind, what might some of the incentives for transparency be? All the following are hypothesis only, and more work would be needed to track down data to explore them more, or studies that might look at these effects in more depth.

1) The figleaf
Starting with a sceptical suggestion. Publishing low-salience information with a large fanfare can be a good way to gain attention and initial credibility without actually facing high political costs. Similarly, in regimes with low state effectiveness, where corrupt activity isn’t captured in the data, or there are no balancing audit and reconciliation mechanisms such as exist in the Extractives Industry Transparency Initiative, then the potential credibility gain from developing a transparency initiative outweighs the potential risks. With growing international focus on transparency initiative, the reputation pay off from an adopting an initiative may be high right now, and may allow other more substantive reforms to be sidelined.

2) International and external pressure
Less sceptically, we might see transparency initiative adoption as a genuine measure by governments, but primarily taking place in response to international pressure or funding. This might be from international agencies, as donors fund and require transparency and governance reforms. Aid Transparency portals in particular may come down to pressure from donors to have accountability on how funds are being spent. Or it might be from business, and markets, as assessments of doing business in a country are affected by the degree of transparency.

3) Bottom up citizen and political pressure
Citizens may be demanding transparency. Certainly in the global development of Right to Information legislation, bottom up citizen pressure has played a significant role. Where democratic mechanisms are operating, then citizen pressure can provide incentives for greater transparency. Similarly, as Francis Maude often states, political parties in opposition are often advocates of transparency.

4) Improving information flow
Effective states need to process a lot of information, and transfer it between many different organisations and agencies. Doing this inside the state, in access-controlled ways, through person-to-person relationships can be complex and costly, and involve lots of interoperable IT systems. By contrast, with open data, you place data online in a standard format, and then anyone who needs it can come and take a copy (or so the theory goes; note that feedback loops from the previous person-to-person relationships fall out of the picture here). Publishing data transparently can get around bottlenecks in information exchange. This may be particularly important when public services are being delivered by lots of non-state actors who could not be brought inside government systems in any case.

This is certainly part of the idea behind the International Aid Transparency Initiative, which seeks to ensure aid receiving governments and agencies can get a view of available resources without having to spend considerable labour requesting and reconciling information from many different sources. Here, the goal is efficiency through outwards and horizontal transparency, and other forms of upwards transparency and visibility of data to citizens may be a by-product.

5) Addressing principle-agent problems
Principle-Agent problems concern the challenges of a principal (e.g. the government;) to motivate an agent (e.g. a contractor;) to act in the interests of the principlal, rather than in the agents self-interest. There are all sorts of principal-agent problems at work in government. For example, the citizen as principal, trying to get government as agent to act in their interests; central government as principle, trying to get an implementing agency to act in their interest; or donor as principle, trying to get a government to act in their interest. Transparency can play a role in all of these, though the form the transparency may take can vary.

Governments are not monolithic. Corruption benefits certain actors in government, and not others. Transparency can be a policy that one area of government uses to secure the behaviour of another, through allowing parties outside of government to provide the scrutiny or political pressure needed to address an issue. The nature of transparency mandates is interesting to explore here. Transparency in one area of government can also empower another. For example, both the UK and China have sought to increased the transparency of local government. This may increase citizen oversight of government, but it can also increase upwards transparency of the periphery to the centre, strengthening central government capacity.

Exploring further
This post has taken a fairly general view of some of the dynamics that might be in play in a decision to adopt a transparency initiative. There are undoubtedly other significant dynamics I’ve missed. And going with my own point on distinguishing both the type and subject of data being made more transparent, any more detailed account is likely to need to be about transparency in particular domains rather than general.

Of course, looking back I suspect I may have misread Doug’s question, which could have been asking more for arguments that can be used to convince governments to adopt transparency, rather than an analytical look. However, I hope some persuasive arguments in favour of transparency can also be distilled from the above.

Grigorescu, A. (2003). International Organizations and Government Transparency: Linking the International and Domestic Realms. International Studies Quarterly, 47, 643–667.

Malesky, E., Schuler, P., & Tran, A. (2012). The Adverse Effects of Sunshine: A Field Experiment on Legislative Transparency in an Authoritarian Assembly. American Political Science Review, 106(4). doi:10.1017/S0003055412000408

Power, T. J., & Taylor, M. M. (2011). Corruption and Democracy in Brazil: The struggle for accountability. University of Notre Dame.

Of nonsensical numbers: openness score

[Summary: A brief critique of the 'openness score']

A recent Cabinet Office press release, picked up Kable Government Computing states that: “The average openness score for all departments is 52%”.

What’s an openness score I hear you ask? Well, apparently it’s “based on the percentage of the datasets published by each department and its arms-length bodies that achieve three stars and above against the Five Star Rating for Open Data set out in the Open Data White Paper”. That is, it’s calculated by an algorithm that looks over all the datasets published by a department on and checks to see what format the files linked to are in.

Which seems to display  both a category mistake on behalf of the Cabinet Office, and a rather worrying lack of statistical literacy and awareness of how such a number might be gamed.

On the category mistake: the openness core appears to equate openness with file format – but ‘openness’ in general is not equivalent to the use of an ‘open’ file format. Firstly, even when using a machine readable format data can be non-open and non-machine readable depending on how it is formatted: a garbled CSV is to all intents and purposes less accessible and open than a well formatted Excel file. Secondly, openness is not just a technical concept, and is not just about data (I’ve commented on that in more detail here). To take the number of well-formatted datasets as a proxy for departmental openness is reductive and narrow in the extreme. This may just be an issue of communication, such that Cabinet Office should be talking about an ‘open data score’ rather than ‘openness score’, but as an input into narratives on open government this risks creating confusion, and again muddling the relationship between openness of government in general, and open data.

On nonsensical numbers: even as an ‘open data score’, the current number is practically meaningless, as it is just a ratio of the non-machine readable to the machine-readable datasets. The score can be increased by removing non-machine readable datasets from, and is skewed according to how many datasets a department publishes. A department publishing two datasets, one machine readable and one not, gets a score of 50%. If they publish an extra dataset, full of meaningful information, but that is not yet machine-readable, their score drops to 33%. This means the score is not only misleading, but potentially creates perverse incentives that run counter to the very notion of Tim Berners-Lee’s 5-Star rating of open data (which I should remind readers is not a rigorously designed set of criteria, but something Tim prepared just before a conference presentation as a rough heuristic for how data should be opened), which calls for people to put data online as a first step, even if it can’t be made machine readable right away.

An openness score constructed as the current score potentially incentivises less data publishing not more.

I hope whoever came up with the idea of the openness score is encouraged to go back to the drawing board and think about both about it’s design, and how it is communicated.


Data and Trust: Raw data now? Or only after rigorous review?

Screen Shot 2012-12-13 at 19.10.17I’ve just come across a report from Science Wise on an ‘Open Data dialogue’ they held earlier this year. The dialogue brought together 40 demographically diverse members of the UK public to discuss how they felt about the application of open data policies to research data. Whilst the dialogue was centred on research data rather than government data, it appears it also touched on public datasets such as health, crime and education statistics, and so the findings have a lot of relevance for the open government data movement as well as the research field. The report from the dialogue is available as a PDF here.

One of the key findings that jumped out at me was amongst a list of ’8 key principles [the the public identified] that could be used to promote more effective open data policies’, and stated the view that ‘Data should be checked for inaccuracies before being made open’ (a number of the other points, such as ‘Raw data should include  full details explaining what the data relates to, how it was collected, who collected it and how formatted’ were also interesting in providing further basis for the Five Stars of Open Data Engagement, but I’ve written about that plenty already). The interesting thing about the idea that data should be checked before being made open is that it runs counter to the call for ‘Raw Data Now’ commonly heard in open data advocacy, where the argument is made that putting data out will allow the errors to be spotted and fixed.

The reason, the Science Wise report explain, that the public in the dialogue were reluctant to accept this was to do with trust (though it should be noted this is something Science Wise were explicitly interested in exploring in their dialogues, so was a considered, but not unprompted response from participants). With lots of data out there, subject to different interpretations, and potentially inaccurate, trust in the data, and the work based upon it may be eroded. Although building trust is one of the reasons often given for openness, the idea that openness can in fact undermine trust is not a new one: see for example Grimmelikhuijsen on Linking transparency, knowledge and citizen trust in government, and Archon Fung speaking at the Open Data Research Network’s Boston workshop. What some of this past work on trust and openness does do, however, is suggest this is an area open to empirical research to test the claims in either direction.

For example, studies could be constructed to ask:

  • Does putting out ‘Raw Data Now’ actually lead to errors being (a) spotted; (b) fixed in the source data; and (c) corrections propagated so the impact of the errors is minimised in work based on the data?
  • Is research or policy based on open data, where that data has been used by third parties, more or less trusted that comparable research or policy without the underlying data being open? What are the confounding factors in either direction?

The call for ‘raw data now’ may be as much strategic (an attempt to head off objections to releasing data) as anything else, but it will take work to understand when its a strategy with a short-term gain and longer term risks, or when it makes sense to pursue.

Perhaps to end with an image:


How might open data contribute to good governance?

Below is the pre-print full text of an article of mine forthcoming in the 2012/13 edition of the Commonwealth Governance Handbook.

You can find a PDF copy over here.

How might open data contribute to good governance?

Access to information is increasingly recognised as a fundamental component of good governance. Citizens need access to information on the decision-making processes of government, and on the performance of the state to be able to hold governments to account. States often require disclosure of information from public and private bodies, making use of targeted transparency1 to regulate the actions of both public and private actors.

Conventionally, access to information has involved access to documents: to published reports and print-outs. However, over the last few years an open data movement has emerged, seeking to move beyond static documents, and asking for direct access to raw datasets from governments (and from other institutions). This movement wants access to data in ways that allows it to be searched, sorted, remixed, visualised and shared through the Internet. Governments have been encouraged to establish open data initiatives and data portals, providing online access to data on everything from national budgets, to school performance, health statistics and aid spending. This article considers the potential implications of open data for democratic governance.

What is open data?

Open data can be formally defined as data that is accessible, machine readable, and openly licensed. In practice, that means data: that can be downloaded from the Internet; that can be manipulated in standard software; and where the user is not prohibited in any way from sharing the data further.

An example may help illustrate this: imagine a national budget that is released in a printed report, made up to hundreds of different tables, each with a slightly different layout. To compare this budget to actual spending, or to see a break down of funds by different categories from those the publisher has chosen to present, citizens would have to re-type all the data into a spreadsheet manually. For a budget this could be weeks upon weeks of laborious work. Even once done, citizens might find that the data is covered by copyright that prohibits their wider use of the information. With open data, these barriers are removed: original spreadsheets of budget information should be published, and the intellectual property license applied to the data should permit citizens to use the data as they choose – including for promoting transparency and accountability and even to support commercial enterprises, perhaps based on providing market intelligence to others.

Open data advocates argue that, by freeing public data (which has commonly already been paid for by citizens through taxation) for re-use, technically skilled developers can build applications and visualisations that support citizens to access it more effectively, and a wide community of innovators can use the data in ways that bring social and economic value the government could never have imagined.

The rise of open data

Although the current open data movements draws upon diverse roots,2 it really burst onto the policy scene in 2009, when US President Barack Obama signed a Memorandum on Transparency and Open Government as one of his first acts in office, leading to the creation of the platform hosting hundreds of federal datasets for public access. This US move was quickly followed by the UK, launching in early 2010 and starting a programme of open data reforms across government that continued and were expanded under a new administration from mid 2010 onwards. In April 2010 the World Bank launched an open data portal, providing free access to hundreds of economic and social indicators, and in July 2011, with World Bank support, Kenya launched it’s own open data portal (, becoming one of the first developing countries to have a national government open data platform. In September 2013, India launched a trial version of, bringing open government data to the world’s largest democracy

Open data has also been a key topic in the Open Government Partnership (OGP)3, co-chaired in 2012/13 by the United Kingdom. Seven commonwealth countries (United Kingdom, Canada Ghana, South Africa, Malta, Trinidad and Tobago and Kenya) are amongst those who joined the Open Government Partnership in its first year. The OGP is a multilateral initiative, jointly run by governments and civil society focusing on transparency, effective and accountable government. The founding declaration of the OGP highlights the importance of technologies in driving more open government:

“New technologies offer opportunities for information sharing, public participation, and collaboration. We intend to harness these technologies to make more information public in ways that enable people to both understand what their governments do and to influence decisions.”

Of the 45 OGP national action plans delivered by July 2012, analysis by Global Integrity4 found that ‘open data’ related commitments were amongst the most common, with countries pledging to create open data portals, or launch open data related programmes of activity.

Open data and governance

A number of connections can be drawn between open data and governance. Open data can drive greater transparency and accountability. It could lead to greater inclusion of citizens in decision-making. And it can support innovation, both in processes of governance, and in the delivery of public services. Let us explore these connections in more detail.

Modern democracy is based upon the idea that governments, institutions and officials in power can be held to account for their decisions and particularly for their use of public funds. Increasingly it is recognised that citizens should also be able to exercise rights to call companies to account for their actions, including their use of natural resources. However, parliaments, citizens and civil society can only exercise their right to call power to account when they have access to transparent, accessible information – and in modern complex states, this may require access to open data.

Open data can allow information from many different sources to be brought together, and for patterns to be found. Instead of searching through boxes of papers, with open data accountability activists, or watchdog organisations, may be able to more easily find out where money is being spent, how government is performing in different regions, or which companies are the worst polluters in a region. In the UK the government has required all local councils to publish open data on their spending transactions over £500, allowing anyone with an Internet connection to see where money is being spent, and which organisations are receiving public funds. Journalists have been some the most frequent users of this data, but it has also been drawn upon by individual citizens and local campaign groups.

Platforms like go a step further in seeking to make data accessible and to promote citizen engagement with key issues like national budget decisions. Open Spending shows budget and spend data from governments through interactive graphs and a searchable database. The OpenSpending platform now contains budget data for Nigeria, India, Kenya, South Africa and the UK amongst others – all input by a network of volunteers working with datasets and documents from their governments.

The emerging power of open data can also be seen in projects like the International Aid Transparency Initiative (IATI) that has created a common data standard for information on aid activities. In the past, aid receiving governments have had to rely on regularly requesting data from the donors operating in their countries to find out what projects are funded where, and citizens have had to search across the websites of many different donors to find out about projects in their country, often finding that only limited information was publicly available. Now, over 50% of official Overseas Development Assistance is published in the IATI standard format, giving governments and citizens up-to-date access to information on who is giving what to whom.5 The data is far from perfect (it is still early days for IATI), but because it is published as open data, third-parties can build upon it, adding extra information such as geographic locations of projects, and ‘mashing up’ the data into visualisations and other products that make it accessible to a wide range of groups. IATI points to the potential of open data to support good governance across borders – and to promote transparency of multilateral institutions that often seem opaque and distant to citizens in any particular country.

Meeting the challenges:

Few strong arguments can be made against the idea that governments should open up access to data. However, open data policies have not been entirely uncontroversial. Firstly, there are questions over whether opening access to data simply ‘empowers the already empowered’6 – as the technical skills required to work with datasets can be relatively advanced. Secondly, concerns have been raised that open data policies can be politically manipulated, with governments choosing to selectively release data that serves their interests, using open data as an instrument of state deregulation and marketisation of public services7. Thirdly, as the International Records Management Trust have highlighted, you can only open up data if you have it – and data can only be effectively used for accountability purposes when it is reliable. As such, open data for governance relies upon good records management, which remains a weakness in many countries8. Fourth, some in the Right to Information (RTI) movement have expressed concern that open data policies, which are often based on voluntary proactive publication of data by government, might displace a focus on the need for RTI legislation which ensures citizens rights to demand information that is then reactively shared.

These issues can, to an extent, be addressed by recognising that open data needs to be about more than just publishing datasets on the Internet. Open data policy should sit as a complement to, not a replacement of, RTI legislation. And open data advocates need to recognise that adopting open data policies also requires investment in capacity building to ensure citizens, civil society, and a new generation of technically-skilled civic activists and intermediaries, can take raw data and turn it into transparent information that supports efforts on accountability and democratic inclusion. The iHub in Nairobi, Kenya has been responding to this challenge by creating an ‘incubator’ to develop the skills and focus of potential open data users9. And in the UK, participants at the 2012 UK GovCamp conference articulated a series of principles for ‘Open Data Engagement’ highlighting the need for open data policy to be demand led, and for governments to see open data as an opportunity for greater collaboration with citizens, rather than just as a one-way route to push out information10.

Whether open data initiatives will fully live up to high expectations many have for them remains to be seen. However, it is likely that open data will come to play a part in the governance landscape across many Commonwealth countries in coming years, and indeed, could provide a much needed tool to increase the transparency of Commonwealth institutions. Good governance, pro-social and civic outcomes of open data are not inevitable, but with critical attention they can be realised11.


1 Fung, A., Graham, M., & Weil, D. (2007). Full Disclosure: The Perils and Promise of Transparency (p. 282). Cambridge University Press.

2 Including, amongst other roots, advocacy for Public Sector Information (PSI) regulation liberalisation in the 1990s and early 21st Century; long established and more recent Right to Information (RTI) campaigns; e-government programmes; and Access to Knowledge campaigns that emerged in response to a global tightening of intellectual property regimes.



5 See and find the data at

6 Gurstein, M. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday, 16(2). Retrieved from

7 Bates, J. (2012). “This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative. The Journal of Community Informatics, 8(2). Retrieved from

8 See



11 Davies, T. (2010). Open data, democracy and public sector reform: A look at open government data use from data. gov. uk. Practical Participation. Retrieved from

Notes on a National Information Infrastructure

In October the Advisory Panel on Public Sector Information (APPSI) released a short discussion paper based on the idea of building a ‘National Information Framework’ to build an ‘information infrastructure’ for the United Kingdom. The idea of an information infrastructure draws upon the metaphor of physical infrastructure – noting that governments play a role planning, investing in, and strategically co-ordinating projects like road, rail, electricity and water supplies, working with a range of private and public stakeholders – and suggesting that similar levels of co-ordination could be applied to the resources that power an information economy.

APPSI’s vision of a National Information Framework incorporates not only datasets, but also skills, standards, meta-data, directories, tools and guidance to ensure the country has the necessary resources and processes to capitalise on available information. It notes that not all PSI is open data, and, whilst welcoming the impetus that an initial ‘transparency’ push in the open data movement gave to efforts to open up PSI (that had perhaps not seen the same dynamism under attempts to implement the European Public Sector Information directive), is not overly concerned with whether data is strictly open data or not, but does emphasise the importance of ‘core datasets’ being free at the point of use. One of APPSI’s core contentions is that the push towards open data has lacked a strategic approach, and the mandate/terms of reference of those groups that have emerged recently, such as the Data Strategy Board, Open Data Users Group, and the Open Data Institute, limit their wide-reaching vision to address not only specific datasets, but also wider structures, skills and investment needed to make the most of data.

The concept of an ‘information infrastructure’ is a very useful one: both to support more strategic thinking about how data resources are made available, and at the same time to support critical analysis of which groups may win or lose from the approaches taken. Whether that critical analysis would support the sort of designed infrastructure APPSI are aiming for (they note that it would require both top-down and bottom-up components, but there is a strong sense in the report of an interest in the top-down) is something to explore. Let us look at a few properties of infrastructures worth noting:

  • The design (or historical accidents in creation of) of an infrastructure has a big impact on what can be built upon it: take for example UK railways, where the track gauge and low bridges rule our running continental style double-decker trains as a way to respond to overcrowding. Choices made about infrastructure can leave a long legacy.
  • National infrastructures often involve a mix of public and private investment: and the UK experience, again with railways, and with private-public partnership building projects for schools and hospitals, shows this does not always lead to a net gain for citizens.
  • Infrastructure decisions are political: choosing where to build a road or an airport can be an important decision to bring economic development to an area, but such decisions can also be made not based on demand or need, but on political horse-trading over projects, popularity, votes and support.
  • Infrastructures require governance and regulation: this was perhaps a missing term in APPSI’s list – but many infrastructures have complex governance systems – from shareholdings in joint public-private companies, to user-groups and watchdogs. The power and responsibility of infrastructures requires strong checks and balances.
  • Infrastructures can be justified on results, or provided in principle: when I was in Norway earlier this year I was struck by hearing about the right of citizens to choose where to live across the vast and remote lands of the North, and to have infrastructure provided to their homes supported by the state. Here, the provision of infrastructure was not on the basis of economic return, but was based on the responsibility of the state towards citizens. By contrast, in many places an infrastructure would only be provided where there was a return to be seen. Infrastructures can be built to serve social value, but often the assessment criteria in a ‘cost benefit analysis’ is the financial bottom line.

As my own work turns to look also at the emergence of open data initiatives and activities across a range of different developing world countries, the infrastructure framing also highlights some useful perspectives and experiences to draw upon – noting that the creation of national infrastructures has often been a space of considerable contestation, corruption and challenge, whilst at the same time being generally recognised as essential for development.

Whilst APPSI do not appear to be terribly strong on outreach and engagement to really spark a debate in the UK over a National Information Framework, I certainly hope this paper does generate more discussion, and thought, about going beyond ad-hoc dataset, but also generates some innovative thinking about how to do that in a way that is not just about central control and planning.


Aside: will a shift to using private data undermine the availability of public data

Paragraph 29 of the APPSI report (LINK) talks about the fact that governments may be able to collect less data, and more of our national information infrastructure may come from private companies. This raises a risk: governments may face a different price for data depending on whether it is for private use within government, or whether it is to be more widely shared and made available. It could end up a lot cheaper to buy in data just for internal use in policy making, with it very expensive to release that data in ways that would let citizens and other actors in the governance arena scrutinise and hold government to account on policy making. If governments access to data for decision making did take this turn, then open policy making may become comparatively more expensive and be undermined.

It is important then for us to be alert when discussion about discontinuing state collection of certain data, in place of buying in that data from private providers, emerge. If the comparison is between the cost of collecting the data, and the of buying in the data just for government use, then the calculations need to be carefully questioned -as the substitution is not like-for-like. In the status quo where government collects data, it can (and increasingly does) share more widely the data upon which policy is based; where government becomes only the holder a license for limited use of a commercially provided dataset the situation is very different.

Right to Data Code of Practice Consultation

[Summary: notes on another open data consultation response]

As if to provide plenty of opportunities for procrastination from working on my PhD, government is providing a constant stream of open data related consultations right now. Next up, a consultation on the Code of Practice to be issued concerning the ‘Right to Data’ introduced by the 2012 Protection of Freedoms Act.

This one is hosted over on, and takes the form of a copy of the current draft with space for paragraph-by-paragraph commenting.

I’ve added in a few responses, in particular to note that:

There are also some wider issues the guidance should perhaps address explicitly, on the additional requirements of attention to detail in privacy terms when releasing data, and there is probably space for the guidance (and consultation) to be in clearer English without sacrificing the legal detail that may be required of it.
It will certainly be interesting to see how the Right to Data plays out in coming years…

Opening the National Pupil Database?

Cross posted from my personal blog.

[Summary: some preparatory notes for a response to the National Pupil Database consultation]

The Department for Education are currently consulting on changing the regulations that govern who can gain access to the National Pupil Database (NPD). The NPD holds detailed data on every student in England, going back over ten years, and covering topics from test and exam results, to information on gender, ethnicity, first language, eligibility for free school meals, special educational needs, and detailed information on absences or school exclusion. At present, only a specified list of government bodies are able to access the data, with the exception that it can be shared with suitably approved “persons conducting research into the educational achievements of pupils”. The DFE consultation proposed opening up access to a far wider range of users, in order to maximise the value of this rich dataset.

The idea that government should maximise the value of the data it holds has been well articulated in the open data policies and white paper that suggests open data can be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”. However, the open data movement has always been pretty unequivocal on the claim that ‘personal data’ is not ‘open data’ – yet the DFE proposals seek to apply an open data logic to what is fundamentally a personal, private and sensitive dataset.

The DFE is not, in practice, proposing that the NPD is turned into an open dataset, but it is consulting on the idea that it should be available not only for a wider range of research purposes, but also to “stimulate the market for a broader range of services underpinned by the data, not necessarily related to educational achievement”. Users of the data would still go through an application process, with requests for the most sensitive data subject to additional review, and users agreeing to hold the data securely: but, the data, including easily de-anonymised individual level records, would still be given out to a far wider range of actors, with increased potential for data leakage and abuse.

Consultation and consent

I left school in 2001 and further education in 2003, so as far as I can tell, little of my data is captured by the NPD – but, if it was, it would have been captured based not on my consent to it being handled, but simple on the basis that it was collected as an essential part of running the school system. The consultation documents state that  ”The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices”, yet, it would be hard to argue this would constitute informed consent for the data to now be shared with commercial parties for uses far beyond the delivery of education services.

In the case of the NPD, it would appear particularly important to consult with children and young people on their views of the changes – as it is, after all, their personal data held in the NPD. However the DFE website shows no evidence of particular efforts being taken to make the consultation accessible to under 18s. I suspect a carefully conducted consultation with diverse groups of children and young people would be very instructive to guide decision making in the DFE.

The strongest argument for reforming the current regulations in the consultation document is that, in the past, the DFE has had to turn down requests to use the data for research which appears to be in the interests of children and young people’s wellbeing. For example, “research looking at the lifestyle/health of children; sexual exploitation of children; the impact of school travel on the environment; and mortality rates for children with SEN”. It might well be that, consulted on whether the would be happy for their data to be used in such research, many children, young people and parents would be happy to permit a wider wording of the research permissions for the NPD, but I would be surprised if most would happily consent to just about anyone being able to request access to their sensitive data. We should also note that, whilst some of the research DFE has turned down sound compelling, this does not necessarily mean this research could not happen in any other way: nor that it could not be conducted by securing explicit opt-in consent. Data protection principles that require data to only be used for the purpose it was collected cannot just be thrown away because they are inconvenient, and even if consultation does highlight people may be willing for some wider sharing of their personal data for good, it is not clear this can be applied retroactively to data already collected.

Personal data, state data, open data

The NPD consultation raises an important issue about the data that the state has a right to share, and the data it holds in trust. Aggregate, non-disclosive information about the performance of public services is data the state has a clear right to share and is within the scope of open data. Detailed data on individuals that it may need to collect for the purpose of administration, and generating that aggregate data, is data held in trust – not data to be openly shared.

However, there are many ways to aggregate or process a dataset – and many different non-personally identifying products that could be built from a dataset, Many of these government will never have the need to create – yet they could bring social and economic value. So perhaps there are spaces to balance the potential value in personally sensitive datasets with the the necessary primacy of data protection principles.

Practice accommodations: creating open data products

In his article for the Open Data Special Issue of the Journal of Community Informatics I edited earlier this year, Rollie Cole talks about ‘practice accommodations’ between open and closed data. Getting these accommodations right for datasets like the NPD will require careful thought and could benefit from innovation in data governance structures. In early announcements of the Public Data Corporation (now the Public Data Group and Open Data User Group), there was a description of how the PDC could “facilitate or create a vehicle that can attract private investment as needed to support its operations and to create value for the taxpayer”. At the time I read this as exploring the possibility that a PDC could help private actors with an interest in public data products that were beyond the public task of the state, but were best gathered or created through state structures, to pool resources to create or release this data. I’m not sure that’s how the authors of the point intended it, but the idea potentially has some value around the NPD. For example, if there is a demand for better “demographic models [that can be] used by the public and commercial sectors to inform planning and investment decisions” derived from the NPD, are there ways in which new structures, perhaps state-linked co-operatives, or trusted bodies like the Open Data Institute, can pool investment to create these products, and to release them as open data? This would ensure access to sensitive personal data remained tightly controlled, but would enable more of the potential value in a dataset like NPD to be made available through more diverse open aggregated non-personal data products.

Such structures would still need good governance, including open peer-review of any anonymisation taking place, to ensure it was robust.

The counter argument to such an accommodation might be that it would still stifle innovation, by leaving some barriers to data access in place. However, the alternative, of DFE staff assessing each application for access to the NPD, and having to make a decision on whether a commercial re-use of the data is justified, and the requestor has adequate safeguards in place to manage the data effectively, also involves barriers to access – and involves more risk – so the counter argument may not take us that far.

I’m not suggesting this model would necessarily work – but introduce it to highlight that there are ways to increase the value gained from data without just handing it out in ways that inevitably increase the chance it will be leaked or mis-used.

A test case?

The NPD consultation presents a critical test case for advocates of opening government data. It requires us to articulate more clearly the different kinds of data the state holds, to be be much more nuanced about the different regimes of access that are appropriate for different kinds of data, and to consider the relative importance of values like privacy over ideas of exploiting value in datasets.

I can only hope DFE listen to the consultation responses they get, and give their proposals a serious rethink.


Further reading and action: Privacy International and Open Rights Group are both preparing group consultation inputs, and welcome input from anyone with views of expert insights to offer.

Digital landscapes: effective open data takes more than a single CSV

Notes from data exploration. I’ve been finding myself tinkering with interesting open datasets quite a lot recently – but often never quite getting to writing anything up before I have to move on to other work. So, rather than loose the learning – I’m going to try and keep notes on this blog of some of those explorations. First up, data from UK Digital Landscape research.


Alongside the new Government Digital Strategy launched this week, Cabinet Office has published Digital Landscape Research undertaken by 2CV.

Interestingly, the report is published natively for the web, with the PDF option being secondary (you can read all the technical details of how the Digital Strategy and report pages are put together here), and at a number of points refers to the dataset published alongside it.

Getting direct access to the raw data underlying government commissioned research is surely a good thing. As I was reading through I started to make notes of some questions I had that I hoped looking at the data might be able to answer.

For example, section 5 of the report sets out ‘Groupings of people who do not use government services or information online’, plots them on an axis for positive and negative perceptions of the Internet, and skills, and then offers descriptive statistics about these groups. However, the report says nothing of how much of the population these groups may represent. Knowing more about the size, and socio-demograhics of these groups beyond those few factors presented in the report would be extremely interesting. Equally, it would be interesting to see more of how these clusters have been constructed, and how the axis have been developed.

However, when I looked at the data it turned out not to be the raw data, but table upon table of cross-tabulations, all set out in a ten-thousand row CSV. I’ve not combed through every row – but from searching on possible variable names, it seems that, even amongst all these cross-tabs, the clusters, which form the most substantive data presentation of results from the data are nowhere to be seen.

Being able to browse the cross-tabs that are there is interesting (for example, discovering the only options in the survey for ‘Occupation’ seemed to be ‘Advertising/Marketing/Market Research; Public Relations; Government Department/Agency; Web design/content provider; Banking; Retail; Other’ – suggesting a rather skewed view of what the population does; and seeing that under 16 year olds are ignored by the research) – but without proper raw data, and a codebook that shows what each variable means, it’s very hard to use this data to do anything more than grab the odd extra statistic out of context for use in a presentation.

I did hope that perhaps the dataset had been listed on along with a little more meta-data, but alas a search there hasn’t turned up anything. I also looked to see if there was anywhere I could ask the authors of the report for more information, or somewhere I could comment on the challenges of using the data to others: but again, nothing to be seen (In fact, as elegant as it is, the online presentation of this report lacks any information on the authors, who is responsible for it, or who to correspond with. A ‘Contact’ link with clear details of the different routes for readers to respond would be valuable addition to these templates in future).

All this goes to highlight for me again that open data involves more than just putting a dataset online.

  • Data structure matters – a lot of open data advocacy ends up focussed on format, but not enough attention is paid to structure. In practice, the main use to which the cross-tabulations provided alongside the Digital Landscape Research could be used is to pick out statistics by hand, rather than for machine processing, and for this, a formatted Excel sheet with tabs for each section of the research, or even a PDF would be just as functional from a users perspective. This isn’t a defence of PDF publishing (though as long as meaning isn’t conveyed in formatting in Excel, the availability of libraries for reading Excel files, and it’s ability to present more user-friendly information means I’m not averse to seeing well-structured data published in Excel), but is rather to make the point that structure is what really matters for data re-use. Consistent columns and rows make for much easier analysis.
  • There is more than one dataset – In any research project the same underlying data might be expressed in a number of different datasets, and it may be appropriate to share any number of these for different audiences. Cross-tabulations that are the product of analysis are useful for many users; the underlying raw survey data, with a row for each response, and a column for each variable may be useful for others; so too might the table of derived variables (e.g. clusters) that has been used in producing the analysis. ‘Publish the data’ doesn’t need to mean one dataset, but could mean publishing data from various stages of research and analysis to meet different needs.
  • Meta-data matters - Without a code book, how can anyone know what the variables mean?
  • Show your working – Between the raw data (if it was shared) and report would still sit a lot of analysis. To really promote opportunities for open data to enable secondary analysis, a researcher would need to be able to see the SPSS, STATA or R commands used to generate conclusions. Sharing source alongside data is part of putting data in context.
  • Be social – If I were to make a cleaned up version of this dataset, structured to more easily support exploration and analysis – how would anyone else find it? Datasets need to be embedded within spaces that support conversation and collaboration. At the very least, for government that should mean listing them on and linking to them there where there’s a minimal comment feature. But really, it should involve a lot more focus on supporting proper open data engagement.

I started out planning to write a post on my explorations of the Digital Landscape Research data. Unfortunately there’s still a way to go before government is publishing truly engaging data by default. However, it’s still fantastic to see the positive step that data underlying research was published, and the team behind it deserve credit for negotiating with a research supplier that their data would be published in this way. Perhaps some part of the digital capacity building planned in the Digital Strategy will focus on all the steps and skills involved in open data publication to help build on these positive steps in future.

Open Data in Developing Countries

The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy