Open data and privacy

Cross-posted from the Open Data Research network site.

On 1st August two IDRC research networks came together for a web meeting to explore Open Data and Privacy. The Privacy in the Developing World network, and the Open Data in Developing Countries network set out to explore whether open data and protecting privacy are inherently in tension, or whether the two can be complementary, and to identify particular issues that might come up around privacy and open data in the developing world. This post shares and develops some of the themes discussed in the meeting.


Open data is generally defined as data made accessible, in formats that can be manipulated by computers (allowing the creation of new interfaces, mash-ups and other data analysis), and without restrictions on how the data can be re-used. In essence, open data asks those who hold data (usually governments) to give up formal control over how it is used, with the idea that this allows greater scrutiny of governments, and unlocks potential for innovation with the data.

Privacy, by contrast, is concerned with control over information, who can access it, and how it is used. As Daniel Solove notes[1] this has many dimensions, from concerns about intrusive information collection, through to risks of exposure, increased insecurity or interference in their decisions that individuals or communities are subjected to when their ‘private’ information is widely known. Privacy is generally linked to individuals, families or community groups, and is a concept that is often used to demarcate a line between a ‘private’ and ‘public’ sphere. Article 12 of the Universal Declaration on Human Rights states “No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation”. It has been argued that privacy is a western concept, only relevant to industrialised societies – yet work by Privacy International has found privacy concerns to be widespread across developing countries, and legal systems across the world tend to recognise privacy as a concern, even if the depth of legal rights to privacy and their enforcement varies. It is worth noting though that few of the countries covered by the ODDC project have strong privacy protection laws in place.

Different kinds of data

One of the starting points of discussion around open data and privacy is to work out which kinds of ‘data’ might fall within the focus of each. In the context of open government data, we might think about three broad categories:

  • Infrastructural data – data held about the state of the world – for example, describing the land, transport networks, structures of government, weather measurements and so-on. There are very few privacy concerns about this data (though in some states security concerns may restrict the extent to which it is shared, such as geographic border and water flow data in the the rivers of Northern India)
  • Public service data – data about the activities of government – ranging from the locations of public services and their budgets through to public registers, and detailed performance statistics on schools, hospitals and other facilities. This last set can be in a grey area – as they are often built up from the aggregation of records about individual users of public services, and it is not always clear who they are about. For example, is the medical record about an operation and it’s outcome data about the patient, or about the doctor?
  • Personal data – data about individuals, and usually things that an individual would have a legitimate right to manage access to – such as information on their sexuality or their health.

In the Web Meeting, Sam Smith noted that the framers of the ‘Open Definition’, taken as a basis for much open data advocacy, were focussed specifically on non-personal data, and that open data advocates tend to make clear that they are not talking about information that could identify private information about individuals. However, as the categories above show, the dividing line between public and personal data is not always clear.

Kinds of data - infrastructure; public service; personal

This classification does make clear however that there are some kinds of data (the infrastructural data) where applying open data should be, from the privacy perspective at least, uncontroversial. The relative importance of data in the middle category to the kinds of outcomes sought from open data policy interventions then becomes an important question to ask.

It is worth noting that because of the political popularity of open data policy, there has been a tendency for other policies relating to data to be presented under an open data banner in some countries. For example, policies on the restricted sharing of medical records with pharmaceutical companies (through secure data sharing rather than as open data) were included in UK open data measures in 2011. These policies clearly need to be considered distinctly from open data policies, and their implications also weighed carefully.

Opening data disrupts past privacy practice

Steve Song offered an input into the Web Meeting focussed on the online publication of a dataset and mash-up map showing the location of registered Gun Owners in the wake of a school shooting in Connecticut. The register of gun ownership had long been a public document, but it had been in the form of documents that could be inspected rather than as a dataset. The conversion of this public register into open data which could be easily mapped created a strong backlash: law enforcement officials worried that their addresses had been revealed online, and those with and without guns expressing concerns that the information could be used by burglars to target particular houses. The accuracy of the record was also questioned, and it was suggested that much of the information was misleading or wrong.

This case illustrated how turning existing ‘public records’ into open data might change some of the balances around privacy that have been struck by the practical difficulties that exist right now of access to those records. Previously the ‘data’ had been hidden in plain view: but no-one had been encouraged to use it in ways that might give rise to concerns. Thought may be needed then not only when things previously secret are made public, but also when public records are turned into more easily manipulated and processed open data. Steve noted that this may be particularly important in contexts of ethnic or communal tensions: imagine for example how voter registers might be used as data where ethnicity can be inferred from the voters name, and where an election is contested on ethnic lines.

In the United Kingdom, the recent Shakespeare Review of Public Sector Information[2] has proposed shifting the legal responsibility for mis-used of data from the person who publishes the data onto the person who abuses the data – suggesting a model in which privacy laws would control (ab)use rather than access to data. However, such a model is tricky to envisage in a world where data can cross borders easily, there is little harmonisation of privacy laws, and harms from privacy violations can also cross borders.

Privacy as an excuse? Open as a general principle?

One of the key concerns raised in the meeting was that if arguments for open data are applied as a ‘general rule’ without sensitivity to the kinds of data in question, there are significant risks that privacy rights might be undermined. Yet, transparency and open data advocates are often concerned that ‘data protection’, or ‘protecting privacy’ might be used as excuses not to release data, or to only release data in aggregated forms that don’t permit detailed analysis of what government is doing. Neither can necessarily be used as a principle that trumps the other.

In his review of open data and privacy for the UK Government, Kieron O’Hara noted[3] that, even within open data advocacy, different groups have different requirements for what for good quality open data for their purposes (§2.1.4). For example, transparency campaigners may be happy with crime data covering general geographic areas that particular official is responsible for, whereas entrepreneurial re-users of data might want data down to the individual street and house-level to feed into risk models for insurers, or to use in route-planning applications.

In our web meeting Steve Song suggested that by developing a clearer picture of the kinds of impacts open data can have, and the ways in which it might be used (a central theme in ODDC explorations), we will be better able to have informed debate about the trade-offs between privacy and open data. This again moves away from the simple rhetorical message of ‘open everything’, and ‘raw data now’ that many open data advocates have pushed for – and suggests that deeper debate will be needed over the sharing of datasets that fall into the grey areas between public and personal. Such a debate will need to engage with questions of whether open data is being used to support public goods or private gains, and with nationally and culturally specific judgements about how to manage trade offs between public good and personal or community privacy. For example, in some countries, personal tax records are considered public and are published, yet in others, these are judged to be private data.

The question of corporate confidentiality was also raised in the web meeting discussions. Although corporate confidentiality is conceptually distinct from privacy, it is another principle that might sometimes be found to be in tension with a drive towards open data, and can become the grounds of excuses for not releasing data. Distinguishing when privacy or corporate confidentiality are being used as excuses for not releasing data, or when they are based in serious and valid concerns, will be important for open data advocacy.

In practice, it wasn’t clear from web meeting participant’s experience whether privacy is actually being used as a grounds for restricting access to data in developing countries, or if privacy is being adequately considered in decisions about opening data. This will be a key issue to track in future research to better understand how potential tensions between open data and privacy are playing out in practice.

Open data, privacy and power

At the Asia regional meeting of the ODDC project, one participant noted the curious overlap between participants in the Data Meet community (often involved in pushing for open data), and those organising ‘Crypto Parties’, teaching each other about privacy protection software. How have these individual reconciled campaigning for both open data and privacy? If they are pushing for a balance between the two, how is such a balance to be struck. One possible way to understand the compatibility of pushing for both privacy and open data is through the lens of power and autonomy. Activists may be interested in seeking maximum autonomy from the state through protecting their privacy, and maximum control over the state, through the ability to see what the state is doing through open data, and to work with state-collected data. Such a political position might be associated with the libertarianism of some open source geek cultures, but may also have different routes and political slants around the world.

The power-based analysis might also help in determining which kinds of entrepreneurial uses of open data are desirable or not. Cases where entrepreneurs act as intermediaries in ways that enhance the autonomy of citizens (for example, providing public transport planning applications to help citizens move more freely through space, or informational applications that help citizens to collaborate and co-create or claim access to public services) may be seen as positive, whereas commercial open data re-use that leads to interference in individuals decisions through targeting of advertising, or that drive discriminatory pricing of services and insurance, might be seen having a negative impact on individual autonomy (although the negative effective may only be felt by some segments of the population such as minority or marginalised groups). The question however would remain of how such potential negative uses of open data should be governed, particular in developing world contexts where legal frameworks vary widely. Serious abuses of open data (whether to incite community tensions, or affect individuals through discriminatory pricing) could be outlawed, but if they have not, what should those releasing data consider?


By the end of the web meeting we had opened up many more issues that we had resolved, but we had established that there can be a productive dialogue between privacy and open data, and that more work is needed to explore how the two concepts together are unfolding in developed and developing world.

