Opening the National Pupil Database?

Cross posted from my personal blog.

[Summary: some preparatory notes for a response to the National Pupil Database consultation]

The Department for Education are currently consulting on changing the regulations that govern who can gain access to the National Pupil Database (NPD). The NPD holds detailed data on every student in England, going back over ten years, and covering topics from test and exam results, to information on gender, ethnicity, first language, eligibility for free school meals, special educational needs, and detailed information on absences or school exclusion. At present, only a specified list of government bodies are able to access the data, with the exception that it can be shared with suitably approved “persons conducting research into the educational achievements of pupils”. The DFE consultation proposed opening up access to a far wider range of users, in order to maximise the value of this rich dataset.

The idea that government should maximise the value of the data it holds has been well articulated in the open data policies and white paper that suggests open data can be an “effective engine of economic growth, social wellbeing, political accountability and public service improvement.”. However, the open data movement has always been pretty unequivocal on the claim that ‘personal data’ is not ‘open data’ – yet the DFE proposals seek to apply an open data logic to what is fundamentally a personal, private and sensitive dataset.

The DFE is not, in practice, proposing that the NPD is turned into an open dataset, but it is consulting on the idea that it should be available not only for a wider range of research purposes, but also to “stimulate the market for a broader range of services underpinned by the data, not necessarily related to educational achievement”. Users of the data would still go through an application process, with requests for the most sensitive data subject to additional review, and users agreeing to hold the data securely: but, the data, including easily de-anonymised individual level records, would still be given out to a far wider range of actors, with increased potential for data leakage and abuse.

Consultation and consent

I left school in 2001 and further education in 2003, so as far as I can tell, little of my data is captured by the NPD – but, if it was, it would have been captured based not on my consent to it being handled, but simple on the basis that it was collected as an essential part of running the school system. The consultation documents state that  “The Department makes it clear to children and their parents what information is held about pupils and how it is processed, through a statement on its website. Schools also inform parents and pupils of how the data is used through privacy notices”, yet, it would be hard to argue this would constitute informed consent for the data to now be shared with commercial parties for uses far beyond the delivery of education services.

In the case of the NPD, it would appear particularly important to consult with children and young people on their views of the changes – as it is, after all, their personal data held in the NPD. However the DFE website shows no evidence of particular efforts being taken to make the consultation accessible to under 18s. I suspect a carefully conducted consultation with diverse groups of children and young people would be very instructive to guide decision making in the DFE.

The strongest argument for reforming the current regulations in the consultation document is that, in the past, the DFE has had to turn down requests to use the data for research which appears to be in the interests of children and young people’s wellbeing. For example, “research looking at the lifestyle/health of children; sexual exploitation of children; the impact of school travel on the environment; and mortality rates for children with SEN”. It might well be that, consulted on whether the would be happy for their data to be used in such research, many children, young people and parents would be happy to permit a wider wording of the research permissions for the NPD, but I would be surprised if most would happily consent to just about anyone being able to request access to their sensitive data. We should also note that, whilst some of the research DFE has turned down sound compelling, this does not necessarily mean this research could not happen in any other way: nor that it could not be conducted by securing explicit opt-in consent. Data protection principles that require data to only be used for the purpose it was collected cannot just be thrown away because they are inconvenient, and even if consultation does highlight people may be willing for some wider sharing of their personal data for good, it is not clear this can be applied retroactively to data already collected.

Personal data, state data, open data

The NPD consultation raises an important issue about the data that the state has a right to share, and the data it holds in trust. Aggregate, non-disclosive information about the performance of public services is data the state has a clear right to share and is within the scope of open data. Detailed data on individuals that it may need to collect for the purpose of administration, and generating that aggregate data, is data held in trust – not data to be openly shared.

However, there are many ways to aggregate or process a dataset – and many different non-personally identifying products that could be built from a dataset, Many of these government will never have the need to create – yet they could bring social and economic value. So perhaps there are spaces to balance the potential value in personally sensitive datasets with the the necessary primacy of data protection principles.

Practice accommodations: creating open data products

In his article for the Open Data Special Issue of the Journal of Community Informatics I edited earlier this year, Rollie Cole talks about ‘practice accommodations’ between open and closed data. Getting these accommodations right for datasets like the NPD will require careful thought and could benefit from innovation in data governance structures. In early announcements of the Public Data Corporation (now the Public Data Group and Open Data User Group), there was a description of how the PDC could “facilitate or create a vehicle that can attract private investment as needed to support its operations and to create value for the taxpayer”. At the time I read this as exploring the possibility that a PDC could help private actors with an interest in public data products that were beyond the public task of the state, but were best gathered or created through state structures, to pool resources to create or release this data. I’m not sure that’s how the authors of the point intended it, but the idea potentially has some value around the NPD. For example, if there is a demand for better “demographic models [that can be] used by the public and commercial sectors to inform planning and investment decisions” derived from the NPD, are there ways in which new structures, perhaps state-linked co-operatives, or trusted bodies like the Open Data Institute, can pool investment to create these products, and to release them as open data? This would ensure access to sensitive personal data remained tightly controlled, but would enable more of the potential value in a dataset like NPD to be made available through more diverse open aggregated non-personal data products.

Such structures would still need good governance, including open peer-review of any anonymisation taking place, to ensure it was robust.

The counter argument to such an accommodation might be that it would still stifle innovation, by leaving some barriers to data access in place. However, the alternative, of DFE staff assessing each application for access to the NPD, and having to make a decision on whether a commercial re-use of the data is justified, and the requestor has adequate safeguards in place to manage the data effectively, also involves barriers to access – and involves more risk – so the counter argument may not take us that far.

I’m not suggesting this model would necessarily work – but introduce it to highlight that there are ways to increase the value gained from data without just handing it out in ways that inevitably increase the chance it will be leaked or mis-used.

A test case?

The NPD consultation presents a critical test case for advocates of opening government data. It requires us to articulate more clearly the different kinds of data the state holds, to be be much more nuanced about the different regimes of access that are appropriate for different kinds of data, and to consider the relative importance of values like privacy over ideas of exploiting value in datasets.

I can only hope DFE listen to the consultation responses they get, and give their proposals a serious rethink.


Further reading and action: Privacy International and Open Rights Group are both preparing group consultation inputs, and welcome input from anyone with views of expert insights to offer.

Digital landscapes: effective open data takes more than a single CSV

Notes from data exploration. I’ve been finding myself tinkering with interesting open datasets quite a lot recently – but often never quite getting to writing anything up before I have to move on to other work. So, rather than loose the learning – I’m going to try and keep notes on this blog of some of those explorations. First up, data from UK Digital Landscape research.


Alongside the new Government Digital Strategy launched this week, Cabinet Office has published Digital Landscape Research undertaken by 2CV.

Interestingly, the report is published natively for the web, with the PDF option being secondary (you can read all the technical details of how the Digital Strategy and report pages are put together here), and at a number of points refers to the dataset published alongside it.

Getting direct access to the raw data underlying government commissioned research is surely a good thing. As I was reading through I started to make notes of some questions I had that I hoped looking at the data might be able to answer.

For example, section 5 of the report sets out ‘Groupings of people who do not use government services or information online’, plots them on an axis for positive and negative perceptions of the Internet, and skills, and then offers descriptive statistics about these groups. However, the report says nothing of how much of the population these groups may represent. Knowing more about the size, and socio-demograhics of these groups beyond those few factors presented in the report would be extremely interesting. Equally, it would be interesting to see more of how these clusters have been constructed, and how the axis have been developed.

However, when I looked at the data it turned out not to be the raw data, but table upon table of cross-tabulations, all set out in a ten-thousand row CSV. I’ve not combed through every row – but from searching on possible variable names, it seems that, even amongst all these cross-tabs, the clusters, which form the most substantive data presentation of results from the data are nowhere to be seen.

Being able to browse the cross-tabs that are there is interesting (for example, discovering the only options in the survey for ‘Occupation’ seemed to be ‘Advertising/Marketing/Market Research; Public Relations; Government Department/Agency; Web design/content provider; Banking; Retail; Other’ – suggesting a rather skewed view of what the population does; and seeing that under 16 year olds are ignored by the research) – but without proper raw data, and a codebook that shows what each variable means, it’s very hard to use this data to do anything more than grab the odd extra statistic out of context for use in a presentation.

I did hope that perhaps the dataset had been listed on along with a little more meta-data, but alas a search there hasn’t turned up anything. I also looked to see if there was anywhere I could ask the authors of the report for more information, or somewhere I could comment on the challenges of using the data to others: but again, nothing to be seen (In fact, as elegant as it is, the online presentation of this report lacks any information on the authors, who is responsible for it, or who to correspond with. A ‘Contact’ link with clear details of the different routes for readers to respond would be valuable addition to these templates in future).

All this goes to highlight for me again that open data involves more than just putting a dataset online.

  • Data structure matters – a lot of open data advocacy ends up focussed on format, but not enough attention is paid to structure. In practice, the main use to which the cross-tabulations provided alongside the Digital Landscape Research could be used is to pick out statistics by hand, rather than for machine processing, and for this, a formatted Excel sheet with tabs for each section of the research, or even a PDF would be just as functional from a users perspective. This isn’t a defence of PDF publishing (though as long as meaning isn’t conveyed in formatting in Excel, the availability of libraries for reading Excel files, and it’s ability to present more user-friendly information means I’m not averse to seeing well-structured data published in Excel), but is rather to make the point that structure is what really matters for data re-use. Consistent columns and rows make for much easier analysis.
  • There is more than one dataset – In any research project the same underlying data might be expressed in a number of different datasets, and it may be appropriate to share any number of these for different audiences. Cross-tabulations that are the product of analysis are useful for many users; the underlying raw survey data, with a row for each response, and a column for each variable may be useful for others; so too might the table of derived variables (e.g. clusters) that has been used in producing the analysis. ‘Publish the data’ doesn’t need to mean one dataset, but could mean publishing data from various stages of research and analysis to meet different needs.
  • Meta-data matters – Without a code book, how can anyone know what the variables mean?
  • Show your working – Between the raw data (if it was shared) and report would still sit a lot of analysis. To really promote opportunities for open data to enable secondary analysis, a researcher would need to be able to see the SPSS, STATA or R commands used to generate conclusions. Sharing source alongside data is part of putting data in context.
  • Be social – If I were to make a cleaned up version of this dataset, structured to more easily support exploration and analysis – how would anyone else find it? Datasets need to be embedded within spaces that support conversation and collaboration. At the very least, for government that should mean listing them on and linking to them there where there’s a minimal comment feature. But really, it should involve a lot more focus on supporting proper open data engagement.

I started out planning to write a post on my explorations of the Digital Landscape Research data. Unfortunately there’s still a way to go before government is publishing truly engaging data by default. However, it’s still fantastic to see the positive step that data underlying research was published, and the team behind it deserve credit for negotiating with a research supplier that their data would be published in this way. Perhaps some part of the digital capacity building planned in the Digital Strategy will focus on all the steps and skills involved in open data publication to help build on these positive steps in future.

Complexity and complementarity – why more raw material alone won’t neccessarily bring open data driven growth

[Summary: reflections on an open data hack day, complexity, and complements to open data for economic and social impact]

“Data is the raw material of the 21st Century”.

It’s a claim that has been made in various forms by former US CIO Vivek Kundra (PDF), by large consultancies and tech commentators, and that is regularly repeated in speeches by UK Cabinet Office Minister Francis Maude, mostly in relation to the drive to open up government data. This raw material, it is hoped, will bring about new forms of economic activity and growth. There is certainly evidence to suggest that for some forms of government data, particularly ‘infrastructural’ data, moving to free and open access can stimulate economic activity. But, for many open data advocates, the evidence is not showing the sorts of returns on investment, or even the ‘gold rush’ of developers picking over data catalogues to exploit newly available data that they had expected.

At a hack-event held at the soon-to-be-launched Open Data Institute in London this week, a number of speakers highlighted the challenge of getting open data used: the portals are built, but the users do not necessarily come. Data quality, poor meta-data, inaccessible language, and the difficulty of finding wheat amongst the chaff of data were all diagnosed as part of the problem, with some interesting interfaces and tools developed to try and improve data description and discovery. Yet these diagnosis and solutions are still based on linear thinking: when a dataset is truly accessible, then it will be used, and economic benefits will flow.

Owen Barder identifies the same sort of linear thinking in much macro-economic international development policy of the 70s and 80s in his recent Development Drums podcast lecture on complexity and development. The lecture explores the question of how countries with similar levels of ‘raw materials’ in terms of human and physical capital, could have had such different growth rates over the last half century. The answer, it suggests, lies in the complexity of economic development – where we need not just raw materials, but diverse sets of skills and supply chains, frameworks, cultures and practices. Making the raw materials available is rarely enough for economic growth. And this something that open data advocates focussed on economic returns on data need to grapple with.

Thinking about open data use as part of a complex system involves paying attention to many different dimensions of the environment around data. Jose Alonso highlights “the political, legal, organisation, social, technical and economic” as all being important areas to focus on. One way of grounding notions of complexity in thinking about open data use, that I was introduced to in working on a paper with George Kuk last year, is through the concept of ‘complementarity’. Essentially A complements B if A and B together are more than the sum of their parts. For example, a mobile phone application and an app store are complements: as the software in one, needs the business model and delivery mechanisms in the other in order to be used.

The challenge then is to identify all the things that may complement open data for a particular use; or, more importantly, to identify all those processes already out there in the economy to which certain open data sets are a complement. Whilst the example above of complements appears at first glance technological (apps and app stores), behind it are economic, social and legal complementarities, amongst others. Investors, payment processing services, app store business models, remmitance to developers, and often-times, stable jobs for developers in an existing buoyant IT industry that allow them to either work on apps for fun in spare time, or to leave work with enough capital to take a risk on building their own applications are all part of the economic background. Developer meet-ups, online fora, clear licensing of data, no fear of state censorship of applications built and so-on contribute to the social and legal background. These parts of the complex landscape generally cannot be centrally planned or controlled, but equally they cannot be ignored when we are asking why the provision of a raw material has not brought about anticipated use.

As I start work on the ‘Exploring the Emerging Impacts of Open Data in the South‘ project with the Web Foundation and IDRC, understanding the possible complements of open data for economic, political and social use may provide one route to explore which countries and contexts are likely to see strong returns from open data policy, and to see what sorts of strategies states, donors and communities can adopt to increase their opportunity to gain potential benefits and avoid possible pitfalls of greater access to open data. Perhaps for further Open Data Institute hack days, it can also encourage more action to address the complex landscape in which open data sits, rather than just linear extensions of data platforms that exist in the hope that the users will eventually come*.

Key perspectives on open data research: Boston workshop

As part of the Open Data Research network projects I’m involved in I had the pleasure of participating in a workshop at Harvard a few weeks back, where Felipe Heusser had brought together a stellar panel involving Archon Fung, David Eaves, Susan Crawford and Yochai Benkler to explore key perspectives on open data research.

Their contributions, and the discussion from the floor, highlighted the need to think deeply and critically about how open data is being used, and what is driving the supply of data. The session was recorded on the rooms built in AV-system, and the Berkman Centre for Internet and Society team have kindly put together a video with all the slides, as well as making it available as a PodCast. The recording is embedded below, and the audio only is on SoundCloud here.

Three new #opendata papers

Just in time for the Open Knowledge Festival in Helsinki we’ve added one new paper, and two new field notes to the Open Data Special Issue of the Journal of Community Informatics.

Simon McGinnes and Kasturi Muthu Elandy in their article Unintended Behavioural Consequences of Publishing Performance Data: Is More Always Better? question the mechanisms by which data may be able to bring about change, bringing thinking from complex systems to understanding the impacts of open data.

Anne Thurston from the International Records Management Trust (who are running a consultation right now on records management in open data) contributes a field note outlining some of the ways in which effective use of open data requires trustworthy records.

And Asne Kvale Handlykken brings lessons from the adoption of open source into the open data debate, with a field note from research into the politics of Free/Libre/Open Source Software (FLOSS) in the context of contemporary South Africa. Asne’s research highlights that movements for openness may be met by countermovements, or that commitments on paper may not lead to implementation.

You can find an overview of all the other papers in the issue here or head straight to the homepage of the issue.

We first published the issue in April ahead of the Open Government Partnership, and have since then continued to work with a number of authors to add key perspectives to the issue, and complete the review process for some of the articles. Just two more articles to come, and then we’ll be finalising the issue.

(Note: there seem to be some layout issues with the journal system for some of the articles right now – hoping to get this fixed soon – but all articles should be readable…)

Ten building blocks of an open data initiative

This is the second in a short series of research notes that aim to draw out practical points from my current PhD explorations of the impacts of open data on inclusive governance. This one draws on a piece I’m working on about taking a broad reading of what it means to have an open data initiative, and focusses on ‘Ten building blocks of an open data initiative’. It has also been written to help highlight different areas that proposals in the Exploring the Emerging Impacts of Open Data in Developing Countries project might look at. As before, it’s here as blog post, and you can download a copy as a PDF.

Ten Building Blocks of an Open Data Initiative

Successful open data initiatives involve more than just putting datasets online. This research note draws on an exploration of a number of different open data projects around the world to highlight key building blocks that open data initiatives commonly involve.

What is an open data initiative?

Open data initiatives work to make data more accessible and re-useable. There is no set template for an open data initiative, and they will vary according to their setting and scope. Initiatives can have a range of goals, from promoting transparency and accountability, to stimulating innovation and economic growth. Open data initiatives can be led by governments proactively releasing data they hold, or they may be led by private actors or civil society who collate relevant public data in ways that make it more accessible. Initiatives can focus on a specific topic (e.g. environmental statistics, transport data or international aid flows) collating data from across multiple jurisdictions, or initiatives might be centred on a particular geographical area or institution, seeking to increase access to all the datasets relating to a particular region, state or government body.

Building blocks

The following building blocks are not sequential. Open data initiatives may assemble them in different orders and different ways.

1.     Leadership and bureaucratic support

In her exploration of open government data initiatives in the UK and US Hogge found that both “a top level mandate” from senior politicians, and an “engaged and well-resourced ‘middle layer’ of skilled government bureaucrats”, were essential to secure the release of open data (Hogge, 2010). Open data initiatives often need to overcome ingrained cultural barriers to sharing data, and to shift long-established practices of data handling. This benefits from strong senior leadership, and from skilled officers with the capacity and freedom to develop open data practices in agile, responsive and sustainable ways.

2.     Datasets

Datasets are at the heart of any open data initiative. Open datasets need to be accessible (usually online), technically open, and legally open (Eaves, 2009). Many open data principles also highlight the importance of datasets being as authoritative, timely (published soon after collection), and as raw (granular data, shared prior to aggregation or analysis) as possible (Malmud & O’Reilly, 2007).

To be technically open a dataset should be in a non-proprietary format (e.g. CSV instead of Excel). It should be possible to load a well-structured open dataset into freely available software or online tools and to then sort, filter, search and manipulate it in a wide range of ways.

Initiatives might be able to export open datasets directly from existing internal databases and information systems. However, in many cases useful open datasets are the product of combining data from different systems and sources, and may need some manual preparation to remove personal information, or to make sure codes and field names used in the data are intelligible.

There are many different kinds of data (Davies, 2012a) and the availability of data to open often depends on the quality of digital data collection and good records management practices. Where an initiative is based around processing and republishing data, it is important for it to show the provenance of the data it is using and to link back to the original source of the data wherever possible.

Some open data initiatives select datasets to publish based on supply: working with ‘low hanging fruit’ and focussing on those datasets that are easiest to open first (Hogge, 2010), whilst others may adopt a demand-driven approach, surveying citizens (Both, 2012) or articulating use-cases to help decided what datasets to release first (Shkabatur, 2012).

3.     Licences

A license sets out explicitly what someone who accesses a dataset can do with it. Datasets can be covered by a range of copyright and intellectual property laws. Without an explicit license, a user does not know if they have the legal permissions to share data further, to combine it with other data, or to build a commercial service off the back of a dataset.

Open data advocates emphasise the importance of licenses that have minimal restrictions, requiring at most attribution of the source of the data. They commonly oppose licenses that impose restrictions on commercial re-use of data (OKF – Open Knowledge Foundation, 2006). Incompatible licenses make it difficult to combine datasets, and so simple, permissive licenses are preferred.

An open data initiative may develop a licensing framework of it’s own, or might adopt one of the standard open database licenses that have been developed to apply to all its datasets (Hatcher & Waelde, 2007).

4.     Data standards

A data standard describes the fields a particular dataset can contain, how they should be represented, and what conventions should be used for sharing dates, locations, categories and other common elements in a dataset. There are widely used standards for many kinds of data, from the General Transport Feed Specification (GTFS) for representing public transport timetables, to the International Aid Transparency Initiative (IATI) standard for representing data on aid flows and development projects.

When data is published using data standards it is easier to compare datasets from different providers, and to re-use existing tools with data. However, standards also limit what can be expressed in a dataset (for example, when their code lists do not map the codes and categories used locally, and so any open data initiative has to make careful choices about which data standards it will use.

Standards also vary in their technical complexity. A CSV spreadsheet standard generates data which can be opened in spreadsheet software. More expressive standards like JSON and XML require specialist skills to use, but can support the creation of richer intermediary tools for accessing data.

Selecting or setting standards, and converting data into common standards is a key part of many open data initiatives.

5.     Data portals

A data portal provides access to open datasets, hosting meta-data that describes them, and allowing visitors to search for relevant datasets.

Open data initiatives commonly provide their own data portals using specialist data catalogue software, although some may chose to curate a list of datasets in a common public data portal like

Some data portals provide extra features, such as datastores and APIs[1], lists of interfaces and applications built with data, discussion and comment features, and in-built tools to visualise data. These should be offered in addition to access to download datasets, rather than instead of.

6.     Interpretations, interfaces and applications

With open data third parties can provide their own interpretations or analysis in static reports and publications; they can build interfaces and visualisations of data to show trends and patterns in it; and they can create interactive applications that provide useful functionality.

In many cases, open data initiatives may provide or fund interpretations, interfaces and applications that make it easier to gain access to extracts from raw datasets, show what the data contains, or demonstrate the potential of the data. Showing visualisations and applications based on data can be important to secure political support for open data initiatives, and developing uses of their own data allows members of an initiative to identify opportunities to improve how it is published.

7.     Outreach and engagement

Just putting data online is not enough to get it used. Outreach, community building and engagement is required. The 5 stars of open data engagement explain that an open data initiative should: be demand driven; put data in context; support conversations around data; build capacity, skills and networks; and lead to collaboration on data as a common resource (Davies, 2012b).

Conferences, trainings, workshops, online Q&A sessions, ‘hack days’ and ‘app competitions’ are all methods that open data initiatives have adopted to support engagement. The US programme also supports the development of thematic communities of practice around clusters of datasets. Hack days and app competitions, which challenge developers to come up with prototypes using datasets often produce striking results in a short period of time to show the potential of data, but only rarely lead to sustainable tools and services based on data.

8.     Capacity building

ICT access, a supporting information infrastructure, digital literacy, and technical skills (Rahemtulla et al., 2011; Shkabatur, 2012) are all commonly required to make effective use of open data (Gurstein, 2011). Even for professionals such as statisticians or managers who work with data on a day-to-day basis, the tools and approaches commonly used in open data may be unfamiliar.

Capacity building often needs to take place on both supply and use sides of an open data initiative. Those managing data inside an organisation need to develop their capacity to provide well-structured, well-managed data, clearly documented with meta-data. Potential users of open data need to be made aware of the data that is available, and of different ways it can be used. Capacity building might also involve paying attention to skills for using data in research, campaigning, policy making or scrutinising government.

9.     Feedback loops

Open data may attract many forms of feedback, from suggested corrections to the data itself, to annotations and additional contextual information, and substantive feedback on the issue the data describes (for example, feedback highlighting the spending reported in a dataset has not reached it’s intended recipients). Open data initiatives need to establish channels through which they can accept and work with feedback, either enhancing the data they hold, or taking action on the basis of feedback. Initiatives need to be clear to their users what they will do with feedback they receive.

10.  Policy and legislative lock-in

Most current open data initiatives are voluntary. Governments or institutions involved in them could stop providing data at any time. Some initiatives have been exploring ways to develop a statutory footing by creating ‘right to data’ legislation, or writing open data clearly into contracts and policies.




Both, W. (2012). Open Data – what the citizens really want. The Journal of Community Informatics, 8(2). Retrieved from

Davies, T. (2012a). Untangling the data debate: definitions and implications. Oxford. Retrieved from

Davies, T. (2012b). Supporting open data use through active engagement. Using Open Data: Policy modelling, citizen empowerment, data journalism (pp. 1–5). Brussels: W3C.

Eaves, D. (2009, November). Three Laws of Open Data (International Edition). Retrieved June 1, 2010, from

Gurstein, M. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday, 16(2). Retrieved from

Hatcher, J., & Waelde, C. (2007). Open Data Commons » Licenses. Retrieved June 1, 2010, from

Hogge, B. (2010). Open Data Study. Transparency and Accountability Initiative. Transparency and Accountability Initiative. Retrieved from

Malmud, C., & O’Reilly, T. (2007, December). 8 Principles of Open Government Data. Retrieved June 1, 2010, from

OKF – Open Knowledge Foundation. (2006). Open Knowledge Definition. Retrieved March 4, 2010, from

Rahemtulla, H., Kaplan, J., Gigler, B.-S., Cluster, S., Kiess, J., & Brigham, C. (2011). Open Data Kenya: Case study of the Underlying Drivers, Principle Objectives and Evolution of one of the first Open Data Initiatives in Africa. Retrieved from

Shkabatur, J. (2012). Towards Open Government for Enhanced Social Accountability (How To Note). Retrieved from To – Open Government DRAFT.pdf

[1] API stands for Application Programming Interface. APIs provide a way for computers to query and fetch sub-sets of a dataset over the Internet programmatically.

Announcing ‘emerging impacts of open data in developing countries’ call for proposals

This is a quick post to point to the call for proposals just issued over on the site for developing country led research into the emerging impacts of open data in developing countries by Web Foundation and the Canadian International Development Research Centre (IDRC).

I’m rather excited that I’ll be working as research co-ordinator on the project over the next couple of years, helping bring together the research network, supporting the different case studies that the project funds, and trying to bring together some of the learning from across different cases.

So – if you know cases of open data in action, from government initiatives to grassroots projects, in developing countries, please do forward on details of the call, or let me know so I can drop them a line.

And expect to hear quite a bit more about the project here over the coming few years…


Developing research plans for a critical development perspective on open data

Cross posted from the Open Data Research blog. Add any comments over there>>>.

Building on the outcomes from the ‘Fostering a Critical Development Perspective on Open Government Data’ workshop held in Brasilia in April, and interactions with Jose Manuel Alonso (Web Foundation) and Fernando Perini (IDRC), I’ve been sketching out some initial ideas for a research agenda to take forward the exploration of how open data initiatives can work positively for development.

We’re planning to have a proper public draft of a research agenda ready to share during July, and we’ll be discussing it during on the the workshops at the 2012 and World Bank International Open Government Data Conference (July 10th – 12th Washington DC), as well as through other face-to-face forums and here online. Based on these discussions we plan to share a ‘call for proposals’ in late July or early August, inviting researchers in the Global South to propose case study research looking into local, national or regional open data initiatives and their development impacts. Then in October we’ll be meeting in at the Berkman Centre at Harvard to review proposals received, and to put together a team of expert mentors who can offer support to the successful research teams. Working with the selected research projects we plan to establish a research network focused on exploring the impacts of open data in development contexts, with a series of online and face-to-face research network activities over the next two to three years. We hope to make many of those network activities open to the wider community also, and to be sharing regular updates at

That much we know. However, as we develop the research outline we’ve got a lot of different elements to consider, and so I thought it would be useful to share some of the current (draft) thinking, open to your feedback. You can add comments to this post (look for the comments link), or drop me a line: with any thoughts, feedback or questions.

Setting the scope: Towards a framework for researching the use of Open Data to secure better governance in developing country contexts 

Open Data is potentially a very broad field, and so we need a conceptual framework to help identify the areas we will focus on, and how we will map out the field.

The Brasilia workshop highlighted that it is important not only to look at open data from the supply side, when governments or other institutions launch open data initiatives, but to also look from the demand side, at cases where grassroots groups, or intermediary organisations, are seeking access to data in order to secure some set of development outcomes and to improve some system of governance. In order to understand the full potential of open data in development it is important for us to explore cases that start both from the supply and from the demand side.

Much of the existing discourse focuses on Open Government Data. This includes an implicit assumes that all the data relevant to securing outcomes of interest from Government, yet in practice this is rarely the case, particularly in development contexts where a wide range of government, NGOs, international agency and private actors may be involved in processes of development. Key datasets relevant to governance may come from many different types of institution, or event crowdsourced from citizens, rather than being officially ‘government data’ (that is, data generated by or owned by government)

Whilst Governments are likely to remain central actors in our exploration, our core interest is not Open Government Data from (as traditionally defined), but the use of open data for governance.  Here, we describe some of the key element of this framework.

A focus on open data

Firstly, in setting our scope we need to define in detail what we mean by open data. As a new phenomenon, we are primarily interested in data which meets the Open Definition (, being accessible and technically and legally open), but we also want to recognise that relevant datasets might be open by degree, or in more partial ways (Smith & Elder, 2010), and so we should not exclude from our view data that meets some, but not all, of the Open Definition criteria. A minority of such cases in our fieldwork may help in highlighting how technical and legal openness and easy accessibility influence the use of data in governance processes.

There are many different kinds of data that our enquiry may touch upon, including data about government operations (from financial budget and expenditure data, to voting records and meta-data about legislation and legislative processes, and performance data on public service delivery), data released through ‘targeted transparency’ policies (Fung et. al, 2007) about companies & markets (from environmental performance and safety statistics, to regular financial reports from listed companies), and data about citizens (such as census records, educational and health statistic, migration data and other socio-economic information).

The open datasets considered in the research may be ‘big data’, suitable for large-scale statistical analysis, or might be small-scale datasets, from which individual actors can directly extract relevant facts. The datasets will vary in terms of comprehensiveness and currency, with some datasets providing a backward look at past policy implementation or public service performance, and others providing ‘real-time’ information, for example, on public transportation

We also need to understand open data in connection with the technical components of open data initiatives beyond raw datasets, from data structures and data standards, to data catalogues and platforms, and wider frameworks relating to copyright and privacy issues in open data. Many of these broader components are being developed on a global level, driven by leading open data initiatives. In our exploration it will be necessary to explore and understand how these components create opportunities and challenges for the implementation of open data initiatives in developing countries.

A focus on the process of governance

The concept of governance is used in a wide range of contexts, from global financial markets to community-based water management.  In general terms, governance can be defined as “the process of decision-making and the process by which decisions are implemented (or not implemented)” (UN ESCAP, n.d.). Lynn et. al. (2000) describe governance as referring to “the means of achieving direction, control, and coordination of wholly or partially autonomous individuals and organisations on behalf of interests to which they jointly contribute.”

Political governance may refer to different parts of the political system of a country, from the definition and monitoring of budgets in national or local institutions of government, to decision making in the legislative process and election processes. Economic governance is often related to the specific firms and markets that are regulated by the government (such as financial and extractive industries) and/or services that governments deliver, subcontracted or subsidize such as transport, education and health. Social governance is generally focused on specific social issues, such as the empowerment of specific marginalized groups, such as women, illiterate groups, youth, poor community or racial minorities. In order to understand how open data initiatives are embedded in political, economic and social contexts, we intend to explore in greater detail the process of governance in different areas, and the changes that open data initiatives are creating.

In the first phase of this research effort, it is likely that we will need to focus on some specific governance themes and issues (for example, budget monitoring; land ownership; local community empowerment) in order to build a more comparable and in-depth understanding of specific domains. Although we are likely to highlight a few areas in the open call, we also would like to build on the presented proposals as a reflection of the interest of the research community.

A focus on emerging outcomes

Open data can impact on governance in a range of different ways. Arguments in favor of open data highlight a number of different outcomes that the release of open data may lead to. These include:

  • Supporting greater transparency and accountability 
  • Stimulating innovation, efficient markets and economic development 
  • Promoting inclusion and empowerment of groups often excluded from the policy process

The extent to which open data initiatives realize these emerging outcomes will depend on a wide range of factors – relating to the qualities of the open data initiative, and a range of wider contextual factors. Identifying the factors that enable or constrain realization of these emerging outcomes in developing country contexts is at the heart of our research.

The emerging outcomes of open data have been put forward as both intrinsically and instrumentally valuable. For example, greater transparency, like greater democracy, is often taken to be a intrinsic good in society, although it is also treated as an instrumental value to promote better governance or a range of other social goods (Fung et. al., 2007). As well as looking at how far greater transparency and accountability result from open data, or the dynamics of open data promoting inclusion and empowerment, in each case the research explores we will also be asking about any substantive development impacts that were anticipate from the open data initiative, and cases will explore the evidence on how far those impacts have been realized, and the enabling and constraining factors in moving from the emerging open data outcomes, to creating change in specific areas, or on specific issues.

Diverse, but comparable cases

Putting these together, we can think about the our cases in terms of the nature of open data used, their interaction with the governance systems in developing countries, and their emerging outcomes and impacts. This should give us a broad framework where cases may fit, and allowing us to cluster and explore comparisons between cases.

Applying the framework

Based on this general framework, we intend to find a range of cases (likely between 5 and 10), and to construct a mixture of flexible, and more formal, tools to capture data within them – supporting cross-case analysis, whilst ensuring that local research also generates value for the local context.

The idea is to announce an open call for proposals with the following characteristics:

1) Case study centered on the governance of relevant developmental issues

Whilst we anticipate using a range of methods at the overall project level, the bulk of the research will happen through in depth case studies, conducted over time, in settings across the developing world. Case studies are well suited to “investigate a contemporary phenomena in depth and within its real-life context”, coping well with “technically distinctive situation[s] in which there will be many more variables of interest than data points” (Yin, 2009).

Based on the general framework suggested above, the cases should centered on  understanding the  (i) the characteristics of open data currently available within the specific case and its relation with global initiatives that aims greater data openness in this specific issue, (ii) the context and structure of the governance in the specific area of development, including a diagnostic of some of the main strengths and weaknesses of governance area under study; (iii) an in-depth analysis of the emerging outcomes of open data initiatives within the specific area of governance  paying special attention to the factors that foster or hamper greater transparency and accountability, better economic efficacy and efficiency and greater inclusion and empowerment of marginalized groups. The focus of cases studies should be on describing and analyzing the complex reality of focus open data initiatives, rather than performing program evaluations of open data initiatives.

Case studies should employ multiple forms of data collection, and we anticipate cases will use a mix of documentary analysis, interviews, surveys, participant observation and action research to explore a specific supply or demand driven open data initiative. We’re particularly interested in finding ways to visualize the networks of actors, datasets, technical artifacts and other elements involved in open data initiatives, and to identify data collection and analysis methods that help us deal with the complexity of interactions involved in open data in developing countries.

2) Policy and intervention oriented (at the local and global level)

Value will be given those cases with a strong policy relevance, both in terms of its local and global application. Case studies may take place in a single country, or taking a comparative approach to look at a particular issue of open data in governance (e.g. budget monitoring) across a number of countries. Case research is expected to take place over 6 to 12 months for the first round of supported projects.

Projects may include elements of action learning, or applied research, with the project research team engaged directly or indirectly in the open data initiative being studied.

We also anticipate that projects will have a strategy for local dissemination of findings to input into local policy and practice debates.

3) Mentoring and peer-support

We will be looking to provide each project with a mentor and an offer of peer-support within the research network. This may focus on the practicalities of an open data initiative (where an expert mentor could offer support to the development of an initiative), or may focus on support for the research, data collection and analysis process.

The structure of the mentorship scheme will be addressed in another future note.

4) Beyond individual case studies: building cross-cutting M&E indicators

We will also be looking to develop some other cross-cutting data collection instruments and forms of analysis to help us answer key research questions. These might include policy reviews of national open data policies; gathering metrics on datasets use and re-use; secondary analysis of documented cases of data use; and drawing on existing indicators or current data collection efforts (such as the Open Data Census) or the Web Index, to build up a more global picture of the developmental impacts of open data.

In addition to exploring the different contextual factors within each particular case of open data use, we are interested in how wider national (or regional) contextual factors may impact upon the realisation of developmental impacts from open data. The Web Foundation have looked at six key sets of contextual factors for and open data initiative: political, legal, organizational, technical, social and economic. These categories can guide us in developing appropriate cross-cutting measures, and descriptive accounts, of the context in which each open data initiative is operating.

The design of this part of the study will be addressed in a future note.

Case studies in summary

Taken together then, we would be looking for proposals that include:

  • the development of case studies of open data use in developing countries that start from both ‘supply’ and ‘demand’ sides, placing a central focus on data used in a specific area of governance 
  • in depth case studies exploring an open data initiative (supply or demand driven), its interaction with existing or emerging governance structures and highlighting observed outcomes (positive and negative). They may include analysis of the networks of agents, artefacts, agendas and actions involved in creating, using and mobilising open data over time.
  • a focus on cases in progress (open data initiatives already implemented or under implementation) involving datasets that meet (to a large extent) the Open Definition. We are looking for cases that can inform policy in developing countries around the adoption of open data for good political, economic and social governance and developmental impact.
  • the overall project will cover a variety of types of data, processes of change, and domains of impact. Each case will have a specific focus, but we anticipate that together will cover a range of political, economic and social impacts, processes of change, and types of data.

We need your views

Everything above is incomplete and draft. It’s not fixed, and will be evolving throughout the coming month(s). We will be talking to people, reviewing the literature, and testing out lots of possible ideas, and so we would welcome your thoughts and input too.

If you have any ideas, questions or suggestions (from pointers to relevant literature, to substantive points on the outline above), then please do get in touch. You can leave a comment on this post, or e-mail

Thank you for taking the time to read and contribute to this project. Keep an eye on the front page of the website for further updates and announcements.




Fung, A., Graham, M., & Weil, D. (2007). Full Disclosure: The Perils and Promise of Transparency (p. 282). Cambridge University Press.

Joshi, A., & Houtzager, P. P. (2012). Widgets or Watchdogs? Public Management Review, 14(2), 37-41.

Lynn, L. E., Heinrich, C. J., & Hill, C. J. (2000). Studying Governance and Public Management : Challenges and Prospects. Journal of Public Administration Research and Theory, 10, 233-261.

Smith, M., & Elder, L. (2010). Open ICT ecosystems transforming the developing world. Information Technologies and International Development, 6(1), 65-71. Retrieved from

Yin, R. K. (2008). Case Study Research: Design and Methods (p. 240). Sage Publications, Inc.

Fostering a Critical Development Perspective on Open Government Data – Brasilia Workshop Report

Around a month back I took part in a workshop in Brasilia organised by IDRC, the Web Foundation and the Berkman Centre  ahead of the Open Government Partnership meeting titled ‘Fostering a Critical Development Perspective on Open Government Data’.

It was a really productive day of discussions, exploring potential research issues, and research strategies to look at the potential impact of open government data policies and practices being adopted in developing country contexts.

The full workshop write-up has just been released (PDF) and we’ve put together a small tumblr blog to feature some related content and detail the ongoing process of taking forward ideas from the workshop into a full research framework and proposal for further collaborative research.

For me, the workshop was particularly valuable in highlighting:

  • The connections between contemporary OGD action, and previous work on using transparency as a tool of advocacy and policy in the environmental movement and Right to Information movements;
  • The connections (and distinctions) between ‘political’, ‘economic’ and ‘developmental’ impacts of open government data;
  • The need for any comprehensive analysis of OGD impacts to be able to start both from the ‘supply side’ of governments releasing data, and from the ‘demand side’ of citizens facing key challenges where getting access to data is adopted as one amongst a number of strategies to achieve change, or of companies seeking to innovate, and thus seeking access to data that can support them;
  • The need to develop a richer notion of ‘open governance data’ to inform research and avoid being artificially restricted to only studying those datasets specifically published as open data by governments;
  • The need to articulate how different theories of change for open data in development allow individuals to be ‘beneficiaries’, ‘partners’ or ‘leaders’ in the development process: treated as objects or subjects of development;
It also highlighted the complexity of the current OGD landscape, and the wide range of possible routes that might be taken to research open data impacts. I’ll be working over the coming months with the World Web Foundation and IDRC to develop some plans for the next steps of the research, and so any comments, feedback or reflections on the report very welcome indeed.

Open data: embracing the tough questions

Two open data related publications I’ve been working on have made it to the web in the last few days. Having spent a lot of the last few years working to support organisations to explore the possibilities of open data, these feel like they represent a more critical strand of exploring OGD, trying to embrace and engage with, rather than to avoid the tough questions. I’m hoping, however, they both offer something to the ongoing and unfolding debate about how to use open data in the interests of positive social change.

Special Issue of JoCI on Open Government Data
The first is a Special Issue of the Journal of Community Informatics on Open Government Data (OGD) bringing together four new papers, five field notes, and two editorials that critically explore how Open Government Data policies and practices are playing out across the world. All the papers and notes draw upon empirical study and grassroots experiences in order to explore key challenges of, and challenges to, OGD.

Nitya Raman’s note on “Collecting data in Chennai City and the limits of Openness” and Tom Demeyer’s account of putting together an application competition in Amsterdam explore some of the challenges of accessing and opening up government datasets in very different contexts, highlighting the complex realities involved in securing ongoing access to reliable government data. Papers from Sharadini Rath (on using government data to influence local planning in India), and Fiorella De Cindo (on designing deliberative digital spaces), explore the challenges of taking open data into civic discussions and policy making – recognising the role that platforms, politics and social dynamics play in enabling, and putting the brakes on, open data as a tool to drive change. A field note from Wolfgang Both and a point of view note from Rolie Cole on “The practice of open data as opposed to it’s promise” highlight that any OGD initiative involves choices about the data to priotise, and the compromises to make between competing agendas when it comes to opening data. Shashank Srinivasan’s note on Mapping the Tso Kar basin in Ladakh, using GIS systems to represent the Changpa tribal people’s interaction with the land also draws attention to the key role that technical systems and architectures play in making certain information visible, and the need to look for the data that is missing from official records.

Unlike many reports and white papers on OGD out there, which focus solely on potential positive benefits, a number of the papers in the issue also take the important step of looking at the potential for OGD to cause harm, or for OGD agendas to be co-opted against the interests of citizens and communities. Bhuvaneswari Raman’s paper
The Rhetoric of Transparency and its Reality: Transparent Territories, Opaque Power and Empowerment
puts power front and centre of an analysis of how the impacts of open data may play out, and Jo Bates “This is what modern deregulation looks like” : co-optation and contestation in the shaping of the UK’s Open Government Data Initiative questions whether UK open data policy has become a fig-leaf for marketisation of public services and neoliberal reforms in the state.

These challenges to open government data, questioning whether OGD does (or even can?) deliver on promises to promote democratic engagement and citizen empowerment are, well, challenging. Advocates of OGD may initially want to ignore these critical cases, or to jump straight to sketching ‘patches’ and pragmatic fixes that route around these challenges. However, I suspect the positive potential of OGD will be closer when we more deeply engage with these critiques, and when in the advocacy and architecture of OGD we find ways to embrace tough questions of power and local context.

(Zainab and I have tried to provide a longer summary weaving together some of these issues in our editorial essay here, although we see this very much as the start, rather than end-point, of an exploration…)

More to come: I’ve been working on the journal issue for just over a year with my co-editor Zainab Bawa, and at the invitation of Michael Gurstein, who has also been fantastically supportive in us publishing this as a ‘rolling issue’. That means we’re going to be adding to the issue over the coming months, and this is just the first batch of papers available to start feeding into discussions and debates now, particuarly ahead of the Open Government Partnership meeting in Brasilia next week where IDRC, Berkman Centre and the World Wide Web Foundation are hosting a discussion to develop future research agendas on the impacts of Open Government Data.

ICT for or against development? Exploring linked and open data in development

The second publication is a report I worked on last year with Mike Powel and Keisha Taylor for the IKM Emergent programme, under the title: ICT for or against development? An introduction to the ongoing case of Web 3” (PDF). The paper asks whether the International Development sector has historically adopted ICT innovations in ways that empower the subjects of development and to deliver sustainable improvements for those whose lives ” are blighted by poverty, ill-health, insecurity and lack of opportunity”, and looks at where the opportunities and challenges might lie in the adoption of open and linked data technologies in the development sector. It’s online as a PDF here, and summaries are available in English, Spanish and French

Open Data in Developing Countries

The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy

RSS Recent Publications

  • An error has occurred, which probably means the feed is down. Try again later.