A diverse open data discourse?

[Summary: on the need for greater diversity in the open data discourse in 2014]
2014 is going to see a lot of Open Data conferences and events around the world, particularly in developing countries, where open data has become part of the donors discourse. And a lot of these events are likely to be packed full of anecdotes and examples of open data applications and websites drawn from the USA and Europe, and presenters whose main contributions to open data have been made in the leading cities of high-tech stable democracies with decades or centuries of systematic governance data collections and records. The stories they can tell are often inspiring – and can spark many ideas about how government could be done differently, or how citizens can use data to drive bottom-up change. But the stories they tell should not be taken as templates to transferred and applied in different countries without consideration of the vastly different contexts.
As the Open Data Barometer demonstrated, many developing countries don’t have consistently collected state datasets just waiting to be opened up, and may have much smaller technology communities to draw upon in mediating raw data into useful platforms and products. In the Open Data in Developing Countries research network we spent time in Cape Town a few weeks ago discussing the need to split apart the packaged definition of open data offered by most high-profile advocates, recognising that for much of the data in the South it may make sense to focus on just one or two of ‘Proactively Published’, ‘Machine readable’, and ‘Legal permissions to use’ in the first instance, working progressively towards increased openness of data, rather than treating open data as a binary all-or-nothing state. The importance of adapting open data ideas to local contexts has been a key theme throughout the emerging research findings: but it’s not one we often hear from conference platforms.
In the conferences next year then, we need to be hearing more voices from those who have been grappling with open data from African, Asian and Latin American perspectives, as well as those from all continents who have been exploring open data outside the capital cities and with grassroots groups. Shifting the practices of the state, of its interlocutors, of citizens and businesses to be ‘open by default’, and ensuring the net-gains that can bring are fairly distributed, is not a simple task – and its one that even leading advocates of open data have only taken the first steps towards. Where we are bringing examples across country contexts in presentations, we need to do more to distill and express the theories of change behind open data impacts, and to open that up so that different countries can work out how to fit the open data vision and agenda into their local political, technical and social realities. And we need to explore the different theories of change emerging across different sectors and countries to understand how the core idea of open data can be assembled in many different ways to bring about change. Getting more diverse voices onto the podium in 2014 is a good way to start that.

Open Data and Improving Governance: issues of measurement

I was speaking at the Institute for International Economic Policy this afternoon at George Washington University at a conference on “Known-Knowns and Unknowns about the Internet: Measuring the Economic, Social, and Governance Impact of the Web“. The input I offered, for a session under the title ‘Has the Internet Helped Citizens and Policymakers Improve Governance? Are the effects measure-able? What new innovations might be helpful?’ was around how we approach measuring the impacts of open data. Below are the notes I prepared for the talk: the actual delivery voyaged off this a bit – but the version below is probably a bit more clear and concise.

You can also find a recording from the whole panel session here , including a fantastic talk from my Web Foundation colleague Bhanupriya Rao on No-tech, low-tech and high-tech for transparency.

Open Data and Improving Governance: issues of measurement

This talk will focus in on Open Data on the Internet, and through that explore one route by which the Internet is involved in changing governance. It will look at three issues: definitions; the role of measurement; and emerging impacts from a recent study of open data – the Open Data Barometer.

Definitions: Open Data

We need to start by defining our terms: what do we mean by open data, and by governance and, as a result, what kind of measurement makes sense. It is important to have a strong and focussed analytical definition of open data, to avoid it becoming an all encompassing idea. Definitions are important in developing measurement. The session titles at the Known Knowns and Unknowns conference use terms of ‘web’ and ‘Internet’ interchangeably – yet for many kinds of intervention these are not one and the same.

Similarly, we see a lot of confusion in the open data field between ideas of open data, big data, linked data and so-forth – and so drilling down into an analytical concept of open data is an important starting point for both research and practice. It is particularly important to do this with data, as just about anything can be represented as data – and if we’re not careful we end up confusing form and content. For example, if we attribute the creation of a host of mobile phone apps that use open transport data, and that generate economic impacts for both consumers and producers, to the openness of the data alone, rather than at least partially to the fact it is data about transport and, moreover, it is usually data about transport in an urban centre with good public transport systems, then we end up anticipating that the next dataset ‘opened up’ might see similar returns and impacts just by virtue of being open data. But if that next dataset is data on cattle movements from a department of agriculture, we might not see quite so many smart-phone apps emerging.

When we disentangle ‘dataset form’ from dataset function and subject matter we can more intelligently ask about the potential impacts.

So. What is the form of open data? There are three elements we use in our operational definition in the ODDC research network:

1) Proactive Publishing – the idea that governments (or other parties) should put data online without being asked for it;

2) Machine readability – the idea that the data should be possible to process with a computer – not just read on screen – but possible to sort, sift through, filter and generally manipulate without high technical barriers. In practice this means using standard file formats which can be accessed without expensive software, and which maintain the granularity of the data.

3) Permission to re-use – the idea that there should not be legal restrictions to prevent someone re-sharing or re-using the data they have been given access to. Often government data is placed under copyrights or IP protections that prohibit re-use, and so the open data movement has advocated for the use of clear license statements that, at most, require re-users to attribute the source of the data and that place no other restrictions on those that wish to work with a dataset.

It is important in our definitions to be aware of different legal and cultural practices around the world, understanding open data as a socio-technical construct. For example, in some countries government data is assumed to be open regardless of the present or not of a license statement; in others, state data is copyright by default, and explicit licenses are needed to give re-users the certainty that they have permission to build upon and market products that use the data.

Definitions: Governance

The second important definitional pre-requisite to address the questions in focus in this session is for ‘governance. Wikipedia here demonstrates the ability of a crowdsourced product to provide the best concise definition, describing governance as concerned with “decisions that define expectations, grant power, or verify performance.”

Now – it might be possible to construct a predominantly descriptive or positivist account of how open data improves the verification of government performance, against pre-set objectives or rules: but any discussion of how the Internet and open data improve governance with respect to decisions that define expectations and grant power is necessarily normative. Deciding when we have an improvement in the setting of expectations or the granting of power involved taking a stand over what counts as improvement. Whilst we may be able to agree at one level about negatives of which the removal constitutes improvement: things like extreme corruption for example; when it comes to a positive vision of what open data should do to governance we find quite divergent views.

Let me illustrate this point by setting out three distinct theories of change for how open data might affect how power is assigned, each belonging to different traditions of political theory .

  • Firstly, there is the idea of that open data enables citizen oversight of their governments and addresses information asymmetries – enabling citizens as voters to better control elected officials as their agents in power. Here, the electoral mechanisms of a governance in an electoral democracy, or indeed, the pressures of public opinion in a constrained autocracy, are left unchanged, but the ability of journalists, pressure groups and citizens to punch through the veil of secrecy around state decisions can drive decisions more in-line with citizen interests.
  • Secondly, we have the idea of a consumer-democracy – one in which citizens engage in governance through individual consumption choices and selection of public services. This is the theory of change prioritised by the David Cameron in his recent speech at the Open Government Partnership – where it is argued that, through open data, citizens can gain more detailed, and personalised, information on public services, and can make more informed choices about which services to select from the ‘marketplace’ of services – thus using market mechanisms to drive better services. Here, ‘governance’ happens through the operation of the market and distributed choices of individuals.
  • Lastly, we have the idea of co-production and more collaborative governance in which open data supports groups made up of citizens, civil society organisations and entrepreneurs to work together with each other, and with the state, to improve policy making and practice. Here, governance is improved when it is more inclusive, and when more people participate in determining the outcome of collectively held power.

These are of course not the only theories of change – but outline the divergent ways in which we might approach the question of what ‘improved governance’ is. Indeed – policy makers and citizens might have very different ideas in any situation of what improved governance looks like: we might hypothesise that more transparent and responsive services are compatible with more efficient and low-cost services, but whether this is true is is an empirical question.

Measurement

The reason for this detour into political theories is not simply to problematise the term of governance, but also to highlight that our approach to measurement also involves taking a stand on normative questions, and to provide a basis for outlining the stand I propose taking.

As Bhanupriya has outlined, many of the kinds of improved governance we want to see involve empowering and enabling groups at the grassroots to engage in policy processes, both acting locally, and speaking out for shared national and global frameworks that better allow them to act locally in the ways that meet their needs. These are not about top-down governance, or enabling policy makers to better control service delivery with a birds-eye view.

The site of acting then, if we are to have actionable measurements on open data and governance, is not simply at the level of policy – but is also at the level of grassroots practice. The measurements we make need to allow grassroots groups to both understand ways of engaging with open data as a resource for improving governance locally, and to make appropriate and effective claims on national and international actions to support them with the data they need for better decision making and monitoring of implementation.

ODDC and The Open Data Barometer

Open Data Barometer

Now – having said all this – we might feel that it adds too much complexity to the process of developing a measurement frameworks, and research into impacts of open data on governance is then necessarily solely a space for action research – with no general measurement possible. But this would not engage fully with the problematisation. Global measurements will be made – and so we should work to make sure that where they are, they are sensitive to the practitioner need at the grassroots, and are a resource for practice – whilst also enabling cross-cutting and comparative global research that can illuminate macro-level trends and feed into national and local policy and practice.

That framed our expiration with the Open Data Barometer, a study launched by Sir Tim Berners-Lee at the Open Government Partnership meeting in London a few week ago that takes a multidimensional look at the readiness of 77 states to secure benefits from open data, the implementation of open data policy via the proxy of dataset publication, and emerging impacts of open data by the proxy of media and academic coverage of it.

It was a pilot study, but one we hope provides strong foundations for future work to understand the governance effects and impacts of opening data. Methodologically is uses an expert survey, combined with a number of secondary indicators – used to create sub-indices and an overall Barometer index number for countries to support overall comparison, and comparison along a number of different dimensions. I want to highlight three key considerations and points of learning from the development of the Barometer:

  • We build on learning from qualitative work to look at different aspects of readiness. For the last year the Web Foundation have been running a research network on Exploring the Emerging Impacts of Open Data in Developing Countries – which you can find at www.opendataresearch.org – and in this we’ve been working with research partners across the developing world to look at the use of open data in affecting governance. Through this the importance of a number of different aspects of government readiness have been emphasised, including the importance of RTI laws alongside open data laws; this qualitative work has also brought up issues around the importance of civil society intermediaries. Working with these qualitative insights we sought to find indicators and expert survey questions that would help us understand appropriate aspects of the context around open data in different countries.
  • We distinguish different kinds of data. Prior studies of open data publication have used a list of datasets based on those felt to be important in London and Washington, rather than looking for datasets that represent the breadth of government activity, and the breadth of theories of change about how open data operates. We put together a list of 14 dataset categories and asked our expert researchers to assess whether this data was available, online, machine readable, openly licensed and so-on. In our analysis we cluster these datasets according to those most likely to be used as part of an ‘accountability stack’, those most often used in ‘innovation’, and those with a strong impact on ‘social policy’.
  • We look at impact based on asking for narratives of change. This speaks to the question of whether effects of open data are measurable. Right now, that measurement is very difficult. Conceptually, open data can be used to achieve a wide range of impacts, so if we had gone in trying to look for one particular kind of data use – data in participatory budgeting for example – we would have risked missing lots of other potential impacts of open data. We’re still looking to find betters methods here – but the approach we took was to ask our expert researchers to look for media mentions of open data having effects in a variety of domains: political, economic, environmental and so-on, and to rate the breadth and depth of impacts cited.

In this talk I won’t go in-depth into the actual Barometer results, as you can find those at www.opendatabarometer.org but I’ll briefly mention a few findings:

  • Open data policy has rapidly spread across the globe – over 50% of our sample of countries had an open data policy, many with strong senior government backing.
  • However, open data availability is low – just 71 of the over 1078 datasets we looked at were available as open data, and in general, publication of data that meets all the criteria for open data I set out above was concentrated in a small number of states. Politically contentious datasets such as government spending, land registries and company registries were least likely to be available.
  • Impacts right now are very low – the average of our 10-point impact scale was below 2 for every category, and remained below 3 even when we took out countries with no open data or open data policy. In terms of the kinds of impacts researchers could locate cited – Transparency and Accountability impacts were most common, with impacts on environmental sustainability, and the inclusions of marginalised groups least likely to be cited.

In conclusion

Returning to the questions that frame this panel: Has the Internet helped citizens and policymakers improve governance? Are the effects measure-able? What new innovations might be helpful?

It seems fair to state as a basic assumption that information does change governance. On the flight here I was reading work by Eleanor Ostrom on governance of the commons, which emphasises the central role of information in governance. Changing how information flows does impact governance – but assessing whether that impact is positive or negative involves normative questions. Right now – the impact of open data on effectively altering the flow of information across society is limited. We see highly-used apps in a limited number of settings like transport, but beyond that we see relatively few datasets that are truly available as open data. When we dig into many of the anecdotes shared about open data impacts, it often turns out that wider contextual variables are much more important in determining outcomes than the particular open properties of the data itself. And yet, policy seems to focus on a replication of a standard model of open data data publication as the primary intervention.

Ultimately in taking measurement forward I’d like to suggest we look in two directions. Firstly, we need to drill down thematically, focussing on data in context in particular settings, looking to generate actionable knowledge for practitioners in these sectors that will help them to advocate bottom-up for open data – rather than focussing on over-generalised measurement that seeks to promises generalised open data impacts without understanding differences of subject matter and context. Secondly, we need to explore developing rigorous shared case study methodologies that can enable us to build measurement and research through controlled cross-case comparisons, informing macro-level assessments, but focussing on micro-level effects and theories of change around open data.

This is something we’re hoping to focus on more in the ODDC project in the coming months – so do join us on the network Linked In group if you would like to explore this more.

Open Data Barometer: 2013 Global Report launched

Open Data BarometerLast Thursday the study I’ve spent the last five month working on with the Web Foundation was formally launched in the Open Government Data Working Group session of the Open Government Partnership Summit. The Open Data Barometer takes a look at the context, implementation and emerging impacts of open government data in 77 countries around the world.

Last week’s launch included both an analytical report and quantitative datasets for the secondary indicator and expert survey data collected in the study. I’ll be writing more in the coming weeks here about the process of designing and carrying out the study, and reflecting on how it might evolve and be built upon in future. But for now, here’s a link to where you can download the report and data, findings from the exec summary, and a few charts pulled out from the overall report.

Executive Summary: 2013 Global Open Data Barometer Report

Open data is still in its infancy. Less than five years after the first major Open Government Data (OGD) portal went live, hundreds of national and local governments have established OGD portals, joined by international institutions, NGOs and businesses. All are exploring, in different ways, how opening data can unlock latent value, stimulate innovation and increase transparency and accountability. Against this backdrop of rapid growth of the open data field, this Open Data Barometer global report provides a snapshot of OGD practices at national level. It also outlines a country-by-country ranking. Covering a broad sample of 77 countries, it combines peer-reviewed expert survey data and secondary indicators to look at open data readiness, implementation and emerging impacts. Through this study we find that:

  • OGD policies have seen rapid diffusion over the last five years, reaching over 55% of the countries surveyed in the Barometer. The OGD initiatives launched have taken a range of different forms: from isolated open data portals launched within an e-government framework, through to ambitious government-wide OGD implementations.
  • But – there is still a long way to go: Although OGD policies have spread fast, the availability of truly open data remains low, with less than 7% of the dataset surveyed in the Barometer published both in bulk machine-readable forms, and under open licenses. This makes it unnecessarily difficult for users to access, process and work with government data, and potential entrepreneurs face significant legal uncertainty over their rights to build businesses on top of government datasets.
  • Leading countries in the ODB are investing in the creation of ‘National Data Infrastructures’ to provide a foundation for public and private innovation and efficiency. They have high-level and broad-based political backing for the OGD initiatives, and are investing in capacity building with entrepreneurs and intermediaries. They are also focussing on building communities around open data, convening government officials and outside stakeholders to understand more clearly how data can be harnessed for economic and social progress. However, no countries can yet claim to fully be ‘open by default’, and embedding OGD practices across government is a key future challenge.
  • Mid-ranking countries have put in place some of the components of an OGD initiative, such as an open data portal and competitions or events to catalyse re-use of data, but have often failed to make key datasets available, and are lacking in important foundations for effective open data re-use. Absence of strong Right to Information laws may prevent citizens from using open data to hold government to account, and weak or absent Data Protection Laws may undermine citizen confidence in OGD initiatives. In addition, limited training and support for intermediaries may mean data cannot be mobilised to generate economic and social benefits.
  • Low-ranking countries have not yet started to engage with Open Data, and many developing countries lack basic foundations such as well-managed and digitised government datasets. In these countries, interventions to support OGD may look radically different from the leading OGD initiatives surveyed in the Barometer – with opportunities for open data approaches to be used to generate, as well as use, public information.
  • The Barometer ranks the UK as the most advanced country for open data readiness, implementation and impact, scoring above the USA (2nd), Sweden (3rd), New Zealand (4th), Denmark and Norway (joint 5th). The leading developing country is Kenya (21st), ranking higher than rich countries such as Ireland (29th) and Belgium (31st). However, no country can yet claim to be fully ‘open by default’.

Furthermore, in offering the first global snapshot covering both OGD policy and practice, the Barometer highlights:

  • Different countries and regions face different challenges in pursuing OGD – including the need to build government data collection and management capacity; the need to support and equip innovators and intermediaries to use data; and the need to secure civil society freedoms that will enable the use of open data for effective transparency and accountability. There is no one-size fits all approach to OGD.

  • Key datasets such as Land Registries and Company Registries are least likely to be available as open data[1], suggesting that OGD initiatives are not yet securing the release of politically important datasets that can be vital to holding governments and companies accountable.

  • In most countries, key datasets for entrepreneurship and improving policy are not available as open data, and when published are in non-standard formats. For example, even in the case of public transport, where data standards are well established, just 25% of countries surveyed have machine-readable data available. Mapping data is also often unavailable in digital forms, or only available for a fee, suggesting that inefficient charging for public data continues to be an issue in many countries.

  • Categories of data managed by statistical authorities are the most likely to be accessible online, but are often only released in very aggregated forms and with unclear or restrictive licenses. Adding a focus on open data to statistical agency capacity building may assist in making key datasets available as bulk, machine-readable open data, contributing positively to the ‘data revolution’ (UN, 2013).

  • Strong evidence on the impacts of OGD is almost universally lacking. Few OGD programmes have yet been evaluated, and the majority of discussion of impacts remains based on anecdote. The Barometer asked about six kinds of OGD impact (government efficiency, transparency and accountability, environmental sustainability, inclusion of marginalised groups, economic growth, and supporting entrepreneurs). In countries with some form of OGD policy (n = 43) in 45% of impact questions no examples of impact could be found, and on average evidence of impact was scored at just 1.7 out of 10.  Scores were particularly low for inclusion and environmental impacts of OGD, suggesting an area in need of further focus.

It remains very early days in the development of OGD practices. The World Wide Web has now been with us for almost 25 years, and, even so, many governments, businesses and civil society groups are still in the early stages of learning how to harness its potential. The open data vision is a bold one: but one that will take considerable work to make a reality. It cannot just be a case of ad-hoc dataset publication, but needs attention paid to legal, social, economic, technical, organisation and political dimensions of open data publication and re-use. This year’s Open Data Barometer provides a baseline for tracking how we collectively progress in the open data arena in years to come.

Web Observatories: The Governance Dimensions

Governance & Sustainability for a Web Observatory

I’m in a workshop at MIT today about plans to create a ‘Web Observatory’, collecting and curating vast quantities of data from across the web for research – in part to ensure that researchers can keep pace in their capacity to research the web with the companies and entrepreneurs who are already gathering terabytes of ‘traces’ of online behaviours in proprietary platforms. A lot of the discussion so far has looked at datasets for research gathered from platforms such as Twitter, curated data from platforms like Open Street Map, or collected in focussed research projects focussed on sensor networks and ‘humans as sensors’. However, the vision of the Web Observatory is not just about providing a catalogue of data for secondary research, but also about providing methods and tools that enable researchers to “locate, analyse, compare and interpret useful information in a consistent and reliable way … rather than drowning in a sea of data”.

As Wendy Hall noted in opening remarks, whilst the Web Observatory work begins with emphasis on academic researchers as the users of data, in the long run, Observatories could (or should) be accessible to individuals also. The growing imbalance of power created between citizens and companies through the privileged access that corporations have to information on our collective social lives is set to become an increasingly pressing social and political issue.

Now, there are clearly big technical challenges ahead in building the Web Observatory project and the many federated Web Observatories that will result, but in this post I want to briefly explore one of the organisational ones: getting the governance and sustainability of Web Observatories right.

Lessons from linked data: sustainability

If you’ve ever spent time exploring Linked Data projects you will have likely stumbled across a lot of abandoned datasets. One off conversions of open data; or data generated through now defunct research projects. The Web of Linked Data is far too often a web of broken links – as the funding for research projects runs out and links go dark.

The Linked Data Around the Clock programme (on a website that’s now offline ) had a slide that captures the coordination dilemma at the heart of creating and sustaining good Linked Data: the value of (linked and/or open) data accrues to a range of parties, and involves input from a range of parties. When projects are sustained through short-term grant funding, which covers all the work to create, curate and make accessible a dataset, then that data is sustainable only so long as the funding continues. It could be argued that when data is open, this is not so big a problem – as someone can simply take a copy of the data and if the original source goes dark, can bring up an alternative host for the data. But in practice, with Web Observatory datasets we’re talking big data where simply storing the datasets can require hundreds of terabytes storage; and datasets which cannot be entirely open due to privacy concerns or Terms of Service of the source data. The data also tends to be shaped primarily by the needs of the funding project that creates it, not by the needs of the projects that want to re-use the data. Although linked data promises distributed annotation and enhancement of data, in practice to query data it needs to be aggregated together in one place – and it’s more efficient to pool resources to enhance and maintain one data store, than to try and copy, convert and enhance multiple copies of big datasets.

So: if learning from Linked Data is anything to go by, the Web Observatory needs to be thinking critically from the start about how key datasets will be sustained, and how collaboration on enhancing data will be facilitated – recognising that there is a non-zero net cost (lots of near-zero marginal costs add up quickly in big data…) to enhancing and adding data to someone else’s data store.

Ethics issues: empowering access

Many of the datasets that might be contained in Web Observatories will raise significant privacy concerns. It might be tempting to manage these by simply deferring responsibility for judging what use can be made of the data to Institutional Review Boards and ethics committees at different participating academic institutions – if the Web Observatory programme is to be open to partners beyond academia, then ethics processes need to be placed into the heart of the Observatory governance structures, rather than managed around the edges.

A proposal: exploring co-opererative ownership and governance

There are, I think, three broad governance models open:

  • Observatories hosted and held-in trust by institutions: institutions, primarily academic, use fixed-term project funds to set up Web Observatories. They let other people use these so long as their funding allows, and prioritise those requests to enhance, extract or work with the data that fit with their own research goals. At the end of the project funding, Observatories either die, or end up maintained through residual or other funds.
  • Independent foundations: the model used by large web public goods like Archive.org and Wikipedia – establishing independent legal entities that maintain an Observatory. This has the value of helping Observatories out-live the projects that start them – but makes Observatories dependent upon finding their own funding, and creates an extra organisation entity over and above the partners with an interest in the data which either ends up with it’s own agenda and organisational imperatives, or which leaves a collective action problem with each of the partners waiting for the others to provide the funding to keep the lights on.
  • Data co-operatives: building on discussions convened last year in Manchester, there may be a new organisational structure the Web Observatories can build upon – that of the data cooperative. In a data cooperative, a light-weight separate entity is established, but which is constituted and jointly owned by the researchers and research institutions with a stake in the data. Cooperatives can establish rules about the resources that members should bring to the co-op, and what control they can expect over the design and maintenance of the Observatory, and can provide procedures for easy entry and exit from the co-op. In Manchester we discussed the potential for hybrid ‘workers/suppliers’ and ‘users/consumers’ co-operatives, that could give both the creators of data, and the researchers using the data, an appropriate stake in it. Co-operative membership to access data with privacy/ethical issues could also address ethics procedures.

Whilst the least developed, this third option I think holds most promise.

I don’t know yet if the Web Observatory programme will have an organisational research – but I hope so…

CfP: Open Data Track – 2014 Conference for E-Democracy and Open Government

I’m co-chairing a track on ‘Open Data, Transparency and Open Innovation‘ at the next CeDEM Conference for E Democracy and Open Government, taking place at Danube University Krems in May next year.

The full call for papers and submission details can be found here, and the details of the Open Data Track are below:

Open Data, Transparency and Open Innovation

Chairs: Johann Höchtl (Danube University Krems, AT), Ina Schieferdecker (Frauenhofer, DE), Tim Davies (University of Southampton, UK)

Open data can provide a platform for many forms of democratic engagement: from enabling citizen scrutiny of  governments, to supporting co-production of public data and services, or the emergence of innovative solutions to shared problems. This track will explore the opportunities and challenges for open data production, quality assurance, supply and use across different levels of governance. Key themes include:

  • Open data policy and politics: opportunities and challenges for governments; the global spread of open data policy; transparency and accountability, economic innovation, drivers for open data; benefits and challenges for developing countries.
  • Licensing and legal issues: copyright vs. open licenses & creative commons; Freedom of Information and the ‘right to data’; information sharing and privacy.
  • Open data technologies: technical frameworks for data and meta-data; mash-ups; data formats, standards and APIs; integration into backend systems; data visualisation; data end-users and intermediaries;
  • Open innovation and co-production: open data enabled models of public service provision; government as a platform; making open data innovation sustainable; data and democracy; connecting open data and crowdsourcing; data and information literacy;
  • Evidence and impacts: costs and benefits of providing or using open data; emerging good practices; methods for open data research; empirical data measuring open data impacts

Submissions are due by 6th January 2014.

Reflections on developing a global sectoral open data initiative: agriculture and nutrition

At the 2012 G-8 Summit leaders committed to a ‘New Alliance for Food Security and Nutrition ‘, and as part of the follow up to the US G8 presidency in April this year the World Bank hosted the ‘G8 Conference on Open Data for Agriculture ‘, exploring opportunities to create a global platform for sharing agriculturally relevant information. Initially driven by the UK and US, this initiative is has developed into the ‘Global Open Data Initiative for Agriculture and Nutrition ‘, currently preparing for a launch at the October 2013 Open Government Partnership summit in London. As the open data concept continues to gain traction at a policy level, such sectoral open data initiatives are increasingly common, and raise a wide range of questions. This post attempts to unpack some such questions for the proposed Agriculture and Nutrition Initiative.

Sector vs. supplier-centric open data

Early open data initiatives, such as the Open Government Data initiatives of the USA, UK and Kenya, have been supplier-centric. They are essentially based on the idea that a single data holder (or, in practice, amalgamation of different departmental data holders, but all from the same overall organisation) supply the data they hold online as open datasets. An open data portal often provides a focal point for this activity.

By contrast, sectoral open data initiatives draw on data from a wide range of suppliers. Some, such as the International Aid Transparency Initiative (IATI) are primarily interested in a single flow of data (in the IATI case, standardised datasets of aid funded activities), although others, such as the renewable energy focussed Reegle project look to aggregate together and integrate a range of different open data datasets with a single sectoral focus*.

An Open Data Initiative for Agriculture and Nutrition may have both supplier-focussed, and sectoral focussed, elements to it. Some of the high-profile holders of data in the agriculture sector, such as theWorld Bank and Food and Agriculture Organisation already have their own open data initiatives. However, there are many more actors who might be suppliers of data when it comes to agriculture and nutrition. It is worth nothing that existing sectoral open data initiatives such as IATI and the Reegle project are relatively limited in their scope and reach, and rely on a certain degree of centralisation (the IATI Registry and Standard in the former case, and a central data store for Reegle in the latter), and so an Open Data Initiative for Agriculture and Nutrition potentially represents a new level of ambition and complexity, requiring more decentralised approaches to securing a wider range of relevant open data.

(*I’ve not looked here at sectoral ‘open data’ initiatives from the sciences as I’m less familiar with these. However, emerging collaborations around genomics research, for example, which seek to pool data into a resource for answering a range of shared research questions may also be relevant points-of-reference in thinking about the shape of an agriculture and nutrition open data initiative).

Why open data?

A recognition of the need for better information and data sharing in agriculture and nutrition is nothing new. For over 100 years organisations like CABI have been producing abstract journals to more effectively transmit agriculture research to the locations on the ground where it is needed, and the agriculture and nutrition field has a well developed network of research institutions, agricultural extension services and initiatives to harmonise information, ranging from the long-establishedAGROVOC vocabulary, through to the more recent CIARD ‘Coherence in Information for Agricultural Research for Development’ movement, bringing together over 50 organisations to collaborate “to make agricultural research information and knowledge publicly accessible to all.”

However, an initiative for open data does have some significant differences of emphasis from one focussing on making information and knowledge publicly accessible. Open data initiatives place explicit emphasis on data over information; upon making that data machine-readable in standard formats; and requiring the use of open licenses that allow the data to be re-used by anyone. A number of arguments might be put forward to justify this specific emphasis:

  • Open data principles lead to lower transaction costs for finding and accessing data – and give re-users certainty that they can work with the data. For example, existing important datasets like Agrovoc do not use an open license, and can only be accessed in bulk behind a registration, meaning that users wanting to use agrovoc classifications in a dataset that also contains commercial data would be prevented from doing so without negotiations with the FAO. Applying open data principles could increase use and coherence of data.

  • Machine-readable data supports efficient and innovative re-use of data. With access to data in standard formats, users can remix, re-interpret and re-present the data, offering alternative interpretations and generating new insights that may not have been contained in shared informational publications. Open data principles also aim to support the easy combination of multiple datasets to support the identification of new trends and patterns across different datasets.

  • Open data allows new actors to get involved in address agriculture and nutrition challenges.Unlike data-sharing initiatives, which often work to ensure an identifiable list of actors have access to data and information, open data is, to a degree, about allowing as key unknown parties to access and innovate with data. This has the potential to bring new researchers, entrepreneurs and policy actors into the process of providing solutions to key challenges, enabling more open forms of innovation .

  • Existing open data initiatives could do more for agriculture. With many governments already publishing open data, and agriculture and nutrition open data initiative can harness the momentum to secure new datasets that would not be provided through existing initiatives – and can work to make sure multi-purpose datasets, such as cadastral data, land ownership records, weather data and other resources are provided in ways that support agriculture and nutrition activities.

I won’t assess the validity of each of these arguments here: that is a matter for empirical research – but they do highlight the kinds of areas that classic open data initiatives may focus on, and allow an assessment of how far an open data initiative may be complementary to other existing activities, or how it might connect with these.

The scope of a global initiative

Agriculture and nutrition is a vast field, and the issues on the agenda vary wildly across the world – from securing crop production and nutritional standards in developing countries, to ensuring trustworthy supply chains in Europe, and from planning for food security, to giving consumers information to choose organic or fairly traded products. An Global Open Data Initiative on Agriculture and Nutrition emerging from the Africa-focussed New Alliance for Food Security and Nutrition could choose to look only at issues of basic food security, but this might be a missed opportunity to also consider how open data has a role to play across a wide range of agriculture and nutrition issues. For example, catalysing activity around open data on food supply chains could be driven by, and have benefits for, both food security and consumer confidence in food.

An initiative also needs to consider whether it’s scope is primarily around public sector data and data held by research organisations, or whether it will also look at the vast quantities of private sector held data on agriculture and nutrition. Many governments already use targeted transparency measures to require food producers to generate and publish nutritional information on their products, suggesting that further steps to require private sector publication of open data on various agriculture and nutrition issues might not be out of the question.

Self-selected commitments, or a shared agenda?

The International Aid Transparency Initiative sets out a clear standard for data that all signatories to the initiative should work towards publishing. As more signatories publish this data to the common standard, network effects kick in making the data more and more useful. By contrast, the Open Government Partnership invites countries to sign up to some broad principles and then to self-select what they will do, and (if choosing to focus on open data at all) the particular datasets they might release: with the areas of focus driven by domestic engagement and pressure. Somewhere between the two, the G8 Open Data Charter includes a list of core datasets that all signatories should work to publish, and then invites self-selected commitments with a long-list of suggested datasets to focus on.

A Global Open Data Initiative on Agriculture and Nutrition could identify a shared agenda based around a small number of datasets and issues, or could be driven by general principles, with members self-selecting their areas of focus. There are pros and cons to each approach, but they potentially lead to initiatives of very different characters, and consequences for the way in which different stakeholders might get involved.

What needs to go into an open data initiative?

I’ve written in the past about ten building blocks of an open data initiative highlighting that open data initiatives need more than datasets – also requiring explicit effort on outreach and engagement, and capacity building to enable wider use of the data that is made available. When it comes to Agriculture and Nutrition there are a wide range of actors who might need to be involved in these wider activities – from the infomediaries who translate research and data into actionable information for farmers and traders, through to the government planners or civil society activists seeking to improve the equitable and fair management of natural resources.

The Ten Building Blocks of open data listed below can take many different forms, and operate at different levels of scale – but an initiative that focusses only on one or two of these building elements to the exclusion of others is unlikely to be able to realise the potential impacts of open data.

  1. Leadership and bureaucratic support

  2. Datasets

  3. Licences

  4. Data standards

  5. Data portals

  6. Interpretations, interfaces and applications

  7. Outreach and engagement

  8. Capacity building

  9. Feedback loops

  10. Policy and legislative lock-in

Evidence and impact

There has been an interesting dialogue recently in the Open Data Innovations LinkedIn Group about “How to monitor progress of open data”. The discussion has highlighted that, as the use to which data will be put is generally left ‘open’, coming up with concrete evaluation frameworks for measure whether open data has had the desired impact can be challenging. This is of course, one of the big issues we’re grappling with in the Open Data in Developing Countries project – currently focussing on qualitiative case studies to understand how open data interacts with existing processes of governance on the ground.

However, it is not inconsistent to set both a series of primary goals for the greater sharing of data against which an intervention can be measured, and to develop frameworks for monitoring secondary impacts resulting from leaving data open to re-use. Such frameworks must, however, be able to also capture unintended consequences of open data re-use – noting that not all results will inevitably be positive, particularly in contexts where so much is stake as the agricultural domain, where the interests of communities, agri-business, governemnts, environmentalists and others are not always aligned.

The ODDC Conceptual Framework seeks to outline some possible directions for such a framework, as a foundation to be further revised as our 2013/14 case studies start to report later this year.

Many more questions…

The formation of larger scale sectoral open data initiatives is an emerging phenomena, and something that will need continued practical and research attention. From a research perspective, it will be fascinating to see how plans for the Global Open Data Initiative on Agriculture and Nutrition involve.

(Disclosure: In my role at the Web Foundation I’ve been involved in some discussions with the convening team for the Global Open Data Initiative on Agriculture and Nutrition, and this post as an open reflection is offered as an input to their ongoing dialogue, as well as a wider reflection on sectoral open data initiatives)

 

Open data and privacy

Cross-posted from the Open Data Research network site.

On 1st August two IDRC research networks came together for a web meeting to explore Open Data and Privacy. The Privacy in the Developing World network, and the Open Data in Developing Countries network set out to explore whether open data and protecting privacy are inherently in tension, or whether the two can be complementary, and to identify particular issues that might come up around privacy and open data in the developing world. This post shares and develops some of the themes discussed in the meeting.

Definitions

Open data is generally defined as data made accessible, in formats that can be manipulated by computers (allowing the creation of new interfaces, mash-ups and other data analysis), and without restrictions on how the data can be re-used. In essence, open data asks those who hold data (usually governments) to give up formal control over how it is used, with the idea that this allows greater scrutiny of governments, and unlocks potential for innovation with the data.

Privacy, by contrast, is concerned with control over information, who can access it, and how it is used. As Daniel Solove notes[1] this has many dimensions, from concerns about intrusive information collection, through to risks of exposure, increased insecurity or interference in their decisions that individuals or communities are subjected to when their ‘private’ information is widely known. Privacy is generally linked to individuals, families or community groups, and is a concept that is often used to demarcate a line between a ‘private’ and ‘public’ sphere. Article 12 of the Universal Declaration on Human Rights states “No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation”. It has been argued that privacy is a western concept, only relevant to industrialised societies – yet work by Privacy International has found privacy concerns to be widespread across developing countries, and legal systems across the world tend to recognise privacy as a concern, even if the depth of legal rights to privacy and their enforcement varies. It is worth noting though that few of the countries covered by the ODDC project have strong privacy protection laws in place.

Different kinds of data

One of the starting points of discussion around open data and privacy is to work out which kinds of ‘data’ might fall within the focus of each. In the context of open government data, we might think about three broad categories:

  • Infrastructural data – data held about the state of the world – for example, describing the land, transport networks, structures of government, weather measurements and so-on. There are very few privacy concerns about this data (though in some states security concerns may restrict the extent to which it is shared, such as geographic border and water flow data in the the rivers of Northern India)
  • Public service data – data about the activities of government – ranging from the locations of public services and their budgets through to public registers, and detailed performance statistics on schools, hospitals and other facilities. This last set can be in a grey area – as they are often built up from the aggregation of records about individual users of public services, and it is not always clear who they are about. For example, is the medical record about an operation and it’s outcome data about the patient, or about the doctor?
  • Personal data – data about individuals, and usually things that an individual would have a legitimate right to manage access to – such as information on their sexuality or their health.

In the Web Meeting, Sam Smith noted that the framers of the ‘Open Definition’, taken as a basis for much open data advocacy, were focussed specifically on non-personal data, and that open data advocates tend to make clear that they are not talking about information that could identify private information about individuals. However, as the categories above show, the dividing line between public and personal data is not always clear.

Kinds of data - infrastructure; public service; personal

This classification does make clear however that there are some kinds of data (the infrastructural data) where applying open data should be, from the privacy perspective at least, uncontroversial. The relative importance of data in the middle category to the kinds of outcomes sought from open data policy interventions then becomes an important question to ask.

It is worth noting that because of the political popularity of open data policy, there has been a tendency for other policies relating to data to be presented under an open data banner in some countries. For example, policies on the restricted sharing of medical records with pharmaceutical companies (through secure data sharing rather than as open data) were included in UK open data measures in 2011. These policies clearly need to be considered distinctly from open data policies, and their implications also weighed carefully.

Opening data disrupts past privacy practice

Steve Song offered an input into the Web Meeting focussed on the online publication of a dataset and mash-up map showing the location of registered Gun Owners in the wake of a school shooting in Connecticut. The register of gun ownership had long been a public document, but it had been in the form of documents that could be inspected rather than as a dataset. The conversion of this public register into open data which could be easily mapped created a strong backlash: law enforcement officials worried that their addresses had been revealed online, and those with and without guns expressing concerns that the information could be used by burglars to target particular houses. The accuracy of the record was also questioned, and it was suggested that much of the information was misleading or wrong.

This case illustrated how turning existing ‘public records’ into open data might change some of the balances around privacy that have been struck by the practical difficulties that exist right now of access to those records. Previously the ‘data’ had been hidden in plain view: but no-one had been encouraged to use it in ways that might give rise to concerns. Thought may be needed then not only when things previously secret are made public, but also when public records are turned into more easily manipulated and processed open data. Steve noted that this may be particularly important in contexts of ethnic or communal tensions: imagine for example how voter registers might be used as data where ethnicity can be inferred from the voters name, and where an election is contested on ethnic lines.

In the United Kingdom, the recent Shakespeare Review of Public Sector Information[2] has proposed shifting the legal responsibility for mis-used of data from the person who publishes the data onto the person who abuses the data – suggesting a model in which privacy laws would control (ab)use rather than access to data. However, such a model is tricky to envisage in a world where data can cross borders easily, there is little harmonisation of privacy laws, and harms from privacy violations can also cross borders.

Privacy as an excuse? Open as a general principle?

One of the key concerns raised in the meeting was that if arguments for open data are applied as a ‘general rule’ without sensitivity to the kinds of data in question, there are significant risks that privacy rights might be undermined. Yet, transparency and open data advocates are often concerned that ‘data protection’, or ‘protecting privacy’ might be used as excuses not to release data, or to only release data in aggregated forms that don’t permit detailed analysis of what government is doing. Neither can necessarily be used as a principle that trumps the other.

In his review of open data and privacy for the UK Government, Kieron O’Hara noted[3] that, even within open data advocacy, different groups have different requirements for what for good quality open data for their purposes (§2.1.4). For example, transparency campaigners may be happy with crime data covering general geographic areas that particular official is responsible for, whereas entrepreneurial re-users of data might want data down to the individual street and house-level to feed into risk models for insurers, or to use in route-planning applications.

In our web meeting Steve Song suggested that by developing a clearer picture of the kinds of impacts open data can have, and the ways in which it might be used (a central theme in ODDC explorations), we will be better able to have informed debate about the trade-offs between privacy and open data. This again moves away from the simple rhetorical message of ‘open everything’, and ‘raw data now’ that many open data advocates have pushed for – and suggests that deeper debate will be needed over the sharing of datasets that fall into the grey areas between public and personal. Such a debate will need to engage with questions of whether open data is being used to support public goods or private gains, and with nationally and culturally specific judgements about how to manage trade offs between public good and personal or community privacy. For example, in some countries, personal tax records are considered public and are published, yet in others, these are judged to be private data.

The question of corporate confidentiality was also raised in the web meeting discussions. Although corporate confidentiality is conceptually distinct from privacy, it is another principle that might sometimes be found to be in tension with a drive towards open data, and can become the grounds of excuses for not releasing data. Distinguishing when privacy or corporate confidentiality are being used as excuses for not releasing data, or when they are based in serious and valid concerns, will be important for open data advocacy.

In practice, it wasn’t clear from web meeting participant’s experience whether privacy is actually being used as a grounds for restricting access to data in developing countries, or if privacy is being adequately considered in decisions about opening data. This will be a key issue to track in future research to better understand how potential tensions between open data and privacy are playing out in practice.

Open data, privacy and power

At the Asia regional meeting of the ODDC project, one participant noted the curious overlap between participants in the Data Meet community (often involved in pushing for open data), and those organising ‘Crypto Parties’, teaching each other about privacy protection software. How have these individual reconciled campaigning for both open data and privacy? If they are pushing for a balance between the two, how is such a balance to be struck. One possible way to understand the compatibility of pushing for both privacy and open data is through the lens of power and autonomy. Activists may be interested in seeking maximum autonomy from the state through protecting their privacy, and maximum control over the state, through the ability to see what the state is doing through open data, and to work with state-collected data. Such a political position might be associated with the libertarianism of some open source geek cultures, but may also have different routes and political slants around the world.

The power-based analysis might also help in determining which kinds of entrepreneurial uses of open data are desirable or not. Cases where entrepreneurs act as intermediaries in ways that enhance the autonomy of citizens (for example, providing public transport planning applications to help citizens move more freely through space, or informational applications that help citizens to collaborate and co-create or claim access to public services) may be seen as positive, whereas commercial open data re-use that leads to interference in individuals decisions through targeting of advertising, or that drive discriminatory pricing of services and insurance, might be seen having a negative impact on individual autonomy (although the negative effective may only be felt by some segments of the population such as minority or marginalised groups). The question however would remain of how such potential negative uses of open data should be governed, particular in developing world contexts where legal frameworks vary widely. Serious abuses of open data (whether to incite community tensions, or affect individuals through discriminatory pricing) could be outlawed, but if they have not, what should those releasing data consider?

Conclusions

By the end of the web meeting we had opened up many more issues that we had resolved, but we had established that there can be a productive dialogue between privacy and open data, and that more work is needed to explore how the two concepts together are unfolding in developed and developing world.

If you would like to join the debate over privacy and open data, there’s a thread over on the Open Data Research network Linked In group.

References

[1] Solove, D. J. (2005). A Taxonomy of Privacy. University of Pennsylvania Law Review, 154(3), 477.

[2] Shakespeare, S. (2013). Shakespeare Review: An independent review of public sector information. London.

[3] O’Hara, K. (2011). Transparent government, not transparent citizens: a report on privacy and transparency for the Cabinet Office.

GIS Watch 2012 article: Who is doing what when it comes to technology for transparency, accountability and anti-corruption

The 2012 issue of Global Information Society Watch was focussed on ‘The Internet and Corruption’, exploring how online technologies are being used in the fight against corruption across the world. GIS Watch is focussed on country level analysis of information society issues around the world, but also includes a number of wider articles. I was asked to put together the ‘institutional review’ on transparency and accountability work. You can find the full GIS Watch here, or read the institutional review article below (licensed under Creative Commons Attribution 3.0 license, please attribute to GIS Watch).

Who is doing what when it comes to technology for transparency, accountability and anti-corruption

Fighting corruption is a responsibility that all global institutions, funders and NGOs have to take seriously. Institutions are engaged with the fight against corruption on a number of fronts. Firstly, for those institutions such as the World Bank that distribute funds or loans there is a responsibility to address potential corruption in their own project portfolios through accessible and well-equipped review and inspection mechanisms. Secondly, institutions with a regulatory role need to ensure that the markets they regulate are free of corruption, and that regulations minimise the potential space for corrupt activity. And thirdly, recognising the potential of corruption to undermine development, institutions may choose to actively support local, national and international anti-corruption activities and initiatives. This report provides a critical survey of some of the areas where multilateral, intergovernmental, multi-stakeholder institutions, NGOs and community groups have engaged with the internet as a tool for driving transparency and accountability.

Although transparency is an often-cited element of the anti-corruption toolbox, technology-enabled transparency remains a relatively small part of the mainstream discourse around anti-corruption efforts in formal international institutional processes. The UN Convention Against Corruption (UNCAC),[1] was adopted in 2003 and ratified by 160 countries, and the OECD Anti-Bribery Convention,[2] adopted in 1997, provides a backbone of international co-operation against corruption and focus heavily on legal harmonisation, improved law enforcement, criminalisation of cross-border bribery and better mechanisms for asset recovery, addressing many of the pre-conditions for being able to act on corruption when it is identified. Continued co-operation on UNCAC takes place through the UN Office on Drugs and Crime,[3] with input and advocacy from the UNCAC Civil Society Coalition. In international development, the outcomes document of the Fourth High Level Forum on Aid Effectiveness,[4] held in Korea in November 2011, notes these foundations and highlights “fiscal transparency” as a key element of the fight against corruption. However, the document more often discusses transparency as part of the aid-effectiveness agenda, rather than as part of anti-corruption. This illustrates an important point: transparency is just one element of the fight against corruption, and reduced corruption is just one of the outcomes that might be sought from transparency projects. Transparency might also be used as a tool to get policy making better aligned with the demands of citizens, or to support co-operation between different agencies.

The review below starts by looking at technology for transparency in this broader context, before briefly assessing how far efforts are contributing towards anti-corruption goals.

Transparency and open data

The last three years have seen significant interest in online open data initiatives as a tool for transparency, with over 100 now existing worldwide. Open data can be defined as the online publication of datasets in machine-readable, standardised formats that can be re-used without intellectual property or other legal restrictions.[5] A core justification put forward for opening up government or institutional data is that it leads to increased transparency as new data is being made available, and existing data on governments, institutions and companies becomes easier to search, visualise and explore.

Following high-profile Open Government Data (OGD) initiatives in the US (data.gov) and UK (data.gov.uk), in April 2012 the World Bank launched its own open data portal (data.worldbank.org), providing open access to hundreds of statistical indicators. Here is how the World Bank describe the data portal’s mission:

The World Bank recognizes that transparency and accountability are essential to the development process and central to achieving the Bank’s mission to alleviate poverty. The Bank’s commitment to openness is also driven by a desire to foster public ownership, partnership and participation in development from a wide range of stakeholders.[6]

The World Bank has also sponsored the development of Open Government Data initiatives in Kenya (opendata.go.ke) and Moldova (data.gov.md), as well as funding policy research and outreach to promote open data through the Open Development Technology Alliance (ODTA).[7]

Central to many narratives about open data is the idea that it can provide a platform on which a wide range of intermediaries can build tools and interfaces that take information closer to people who can use it. The focus is often on web and mobile application developers as the intermediaries. Many of the applications that have been built on open data are convenience tools, providing access to public transport times or weather information, but others have a transparency focus. For example, some apps visualise financial or political information from a government, seeking to give citizens the information they need to hold the state to account.

Apps alone may not be enough for transparency though. In an early case study of the Kenya open data initiative, Rahemtulla et. al., writing for the ODTA, note that “the release of public sector information to promote transparency represents only the first step to a more informed citizenry…”, and that initiatives should also address digital inclusion and information literacy. This involves ensuring ICT access, and the presence of an ‘info-structure’ of intermediaries who can take data and turn it into useful information that actively supports transparency and accountability.[8] World Bank investments in Kenya linked to the open data project go some way to addressing this, seeking to stimulate and develop the skills of both journalists and technology developers to access and work with open data. However, much of the focus here is on e-government efficiency, or stimulating economic growth through creation of commercial apps with open data, rather than on transparency and accountability goals.

Open data was also a common theme in the first plenary meeting of the Open Government Partnership (OGP)[9] in Brazil in April 2012. The OGP is a new multilateral initiative run by a joint steering committee of governments and civil society. Launched in 2011 by eight governments, it now has over 55 member states. Members commit to create concrete National Action Plans that will “promote transparency, empower citizens, fight corruption, and harness new technologies to strengthen governance”.[10] The OGP has the potential to play an influential role over the next few years in networking civil society technology-for-transparency groups with each other, and with governments, and placing the internet at the centre of the open government debate.

The rapid move of open data from the fringes of policy into the mainstream for many institutions has undoubtedly been influenced by the activities of a number of emerging online networks and organisations. The Open Knowledge Foundation (OKF)[11] has played a particularly notable role through their e-mail lists, working groups and conferences in connecting up different groups pushing for access to open data. OKF was founded in 2004 as a community-based non-profit organisation in the UK and now has 15 chapters across the world. OKF explain that they ‘build tools, projects and communities’ that support anyone to “create, use and share open knowledge”.[12] The OKF paid staff and volunteer team are behind the CKAN software used to power many open data portals, and the OpenSpending.org platform that has the ambition to “track every government financial transaction across the world and present it in useful and engaging forms for everyone from a school-child to a data geek”.[13] This sort of ‘infrastructure work’ – building online platforms that bring government data into the open and seek to make it accessible for a wide range of uses – is characteristic of a number of groups, both private firms and civil society, emerging in the open data space.

Another open data actor gaining attention on the global stage has been the small company OpenCorporates.com.[14] OpenCorporates founder, Chris Taggart, describes how their goal is to gather data on every registered company in the world, providing unique identifiers that can be used to tie together information on corporations, from financial reporting, to licensing and pollution reports. Although sometimes working with open data from company registrars, much of the OpenCorporates database of over 40-million company records has been created through “screen scraping” data off official government websites. In early 2012 OpenCorporates were invited to the advisory panel of the Financial Stability Board’s[15] Global Legal Entity Identifier (LEI) project, being conducted on behalf of the G20. The LEI project aims to give a unique identifier to all financial institutions and counterparties, supporting better tracking of information and transactions. Importantly the recommendations, which have been accepted by the G20, will operate “according to the principles of open access and the nature of the LEI system as a public good… without limit on use or redistribution”.[16]

International transparency initiatives and standards

A number of sector-specific international transparency initiatives have developed in recent years, with a greater or lesser reliance on the internet within their processes.

Online sharing of data is at the heart of the International Aid Transparency Initiative (IATI)[17] which was launched at the third High Level Forum on Aid Effectiveness in Accra, Ghana in 2008, and now has over 19 international aid donors as signatories. The initiative’s political secretariat is hosted by the UK Department for International Development (DfID),[18] and a technical secretariat, which maintains a data standard for publishing data on aid flows, is hosted by the AidInfo programme.[19] IATI sets out the sorts of information on each of their aid activities that donors should publish, and provides an XML standard for representing this as open data.[20] A catalogue of available data is then maintained at http://www.iatiregistry.org, and a number of tools have been developed to visualise and make this data more accessible. Through IATI, countries and institutions, from the Asian Development Bank (ADB), to the UN Office of Project Services (UNOPS), have made information on their aid spending or management more accessible.

The Open Aid Partnership,[21] working closely with IATI, and hosted by the World Bank Institute, is focusing specifically on geodata standards for aid information, using the ‘Mapping for Results’ methodology developed with AidData[22] to geocode the location of aid projects and make this information available online. Geocoded data is seen as important to “promote ICT-enabled citizen feedback loops for reporting on development assistance”.[23]

A number of other high-profile sector transparency initiatives, the Extractive Industries Transparency Initiative (EITI),[24] and the Construction Sector Transparency Initiative (CoST)[25] are less open data or ICT-centred, opting instead for processes based on disclosure and audit of documents through local multi-stakeholder processes. However, the Global Initiative on Fiscal Transparency (GIFT),[26] which aims to “advance and institutionalise global norms and continuous improvement on fiscal transparency, participation and accountability in countries around the world”, has a ‘Harnessing new technologies working group’ led by the OKF, which has outlined a number of ways technology can be used for transparent and accountable finance.[27] The ‘Lead Steward’ organisations for GIFT are the International Monetary Fund, World Bank Group, Brazil Ministry of Planning, Budget and Management, the Department of Budget and Management Philippines, and the Washington based CSO-project, the International Budget Partnership.[28]

Crowd-sourcing 

Transparency and accountability isn’t just about information and data from governments, companies or multi-lateral institutions. Input from citizens is crucial too. Crowd-sourcing projects such as Ushahidi,[29] first developed to monitor post-election violence in Kenya, have been deployed or replicated in a number of anti-corruption settings. Accepting submissions by SMS or online, these tools allow citizens to report problems with public services that might point to appropriation of funds, or to directly report cases of corruption. Reports are generally geocoded and the resulting maps are presented publicly online. With UN Development Programme (UNDP)[30] support, a Ushahidi-based corruption monitoring platform was established in Kosovo.[31] In India, the IPaidABribe.com platform, which was launched in 2010 by Bangalore based non-profit Janaagraha,[32] has collected over 20,000 reports of bribery requests or payments.

UNDP analysis suggests that the success of social media and use of crowd sourcing in transparency and accountability projects relies upon transparent mechanisms for verifying reports, and the backing of institutions or systems that can convert information into action – such as ensuring corrupt tenders are cancelled.[33] In a global mapping of technology for transparency and accountability, The Transparency & Accountability Initiative[34] (a donor collaboration chaired by DfID and the Open Society Foundation),[35] found that many of the one hundred projects they reviewed were started by technology-savvy activists.[36] Where these were tailored to local context, and able to adopt a collaborative approach, involving governments and/or service providers, they were more likely to be sustainable and successful. Global Voices Online maintain a directory of over 60 case studies as part of their ‘technology for transparency network’.[37]

The internet is also being used actively by global advocacy networks such as the Land Matrix Partnership, who launched an online database of land deals at the World Bank Land and Poverty Conference in April 2012, seeking to highlight the growing issue of large scale land acquisitions across the world, particularly in Africa. This database, initially created through online collaboration of researchers, also accepts submissions through its website at http://landportal.info/landmatrix where reported data can also be visualised and explored.

Further activity and institutions

For reasons of space this report can only make passing mention of initiatives aimed at increasing parliamentary transparency through developing and implementing online tools for tracking legislative process and parliamentary debates. These have been established by civil society networks in a number of countries following models developed by the independent GovTrack in the US,[38] and the charity MySociety[39] with their TheyWorkForYou.com platform in the UK. MySociety, with support from Open Society Foundation and Omidyar Network,[40] have been focusing in 2012 on making their transparency and civic action tools easier to implement in other jurisdictions, opening up the Alavateli code that powers the public right to information services WriteToThem.com and AskTheEu.org, amongst others.

The funding for this work from the Omidyar Network, established by eBay founder Pierre Omidyar, draws attention to another set of important institutions and actors in the tech-for-transparency space: donors from the technology industry. Google, Omidyar Network, Cisco Foundation, and Mozilla Foundation amongst others have all been involved in sponsoring technology for transparency open source projects like Ushahidi, the work of MySociety, or data-journalism projects across the world. It is likely that without access to funding derived from internet industry profits, many of the current technology-for-transparency projects around would be far less advanced. 

This report has also not explored how institutions have responded to online leaking of information as part of transparency and accountability efforts. However, one project deserves a brief mention: the WCITLeaks website [41] established to accept leaked documents relating to the revision of the International Telecommunications Regulations (ITRs) in response to the secrecy surrounding International Telecommunication Union (ITU) processes, and the lack of a civil society voice at the forthcoming World Conference on International Telecommunication (WCIT).

Exploring impact 

Technology for transparency is a rapidly growing field. The innovations may be emerging from civil society and internet experts (with much of the funding to scale up projects often coming ultimately from internet firms), but governments and international institutions are opting-in to open data based transparency initiatives, and a number of institutions, from the World Bank, to the newly formed OGP, are active in spreading the technology for transparency message to their clients and members. However, there is little hard evidence yet of the internet becoming an integrated and core part of the global anti-corruption architecture, and many tools and platforms remain experimental, hosting just tens or hundreds of reported issues, and offering only limited stories of where crowd-sourced SMS reports, or irregularities spotted in open data, have led to corruption being challenged, and offenders being held to account.

McGee and Gaventa in a review of general transparency and accountability initiatives funded by DfID explain that the evidence base on their impact is limited across the field.[42] Limited evidence of the anti-corruption impacts of technology for transparency should therefore be taken as a challenge to improve the evidence base and focus in impact, rather than to step back from developing new internet-based approaches for transparency and accountability. Working out the impact of those projects that provide online information infrastructures as foundations for accountability efforts, from general open government data projects, to targeted transparency initiatives, will need particular attention if these efforts are to continue to receive institutional backing, and if the new loose-knit networks that provide many of these platforms are to continue to thrive.

 

 

(All links accessed 7th July 2012)


[3] UN Office on Drugs and Crime: http://www.unodc.org/unodc/en/corruption/

[4] Busan 2011 High Level Forum on Aid Effectiveness: http://www.aideffectiveness.org/busanhlf4/

[5] The Open Definition: http://opendefinition.org/

[6] World Bank Open Data Portal: http://data.worldbank.org/about

[7] Open Development Technology Alliance: http://www.opendta.org

[8] Rahemtulla, H., Kaplan, J., Gigler, B.-S., Cluster, S., Kiess, J., & Brigham, C. (2011). Open Data Kenya: Case study of the Underlying Drivers, Principle Objectives and Evolution of one of the first Open Data Initiatives in Africa. http://www.scribd.com/doc/75642393/Open-Data-Kenya-Long-Version

[9] Open Government Partnership: http://www.opengovpartnership.org/

[11] Open Knowledge Foundation: http://okfn.org/

[12] http://www.okfn.org/about/faq

[13] Open Spending (project) http://www.openspending.org

[14] Open Corporates: http://opencorporates.com/

[15] Financial Stability Board: http://www.financialstabilityboard.org/

[17] International Aid Transparency Initiative: http://www.aidtransparency.net

[18] UK Department for International Development: http://www.dfid.gov.uk

[21] Open Aid Partnership: http://www.openaidmap.org/

[24] Extractive Industries Transparency Initiative: http://eiti.org/

[25] Construction Sector Transparency Initiative: http://www.constructiontransparency.org/

[26] Global Initiative for Fiscal Transparency http://fiscaltransparency.net/

[28] International Budget Partnership: http://internationalbudget.org/

[29] Ushahidi: http://ushahidi.com/about-us

[30] UN Development Programme: http://www.undp.org/

[33] Tsegaye Lemma (2012), Corruption Prevention and ICT: UNDP’s Experience from the field. Presented at Joint Experts Group Meeting and Capacity Development Workshop on Preventing Corruption in Public Administration, UN DESA, New York, USA, 26 – 28 June. http://unpan1.un.org/intradoc/groups/public/documents/un-dpadm/unpan049778.pdf

[34] Transparency and Accountability Initiative: http://www.transparency-initiative.org/

[35] Open Society Foundations: http://www.soros.org/

[36] Avila, R., Feigenblatt, H., Heacock, R., & Heller, N. (2011). Global mapping of technology for transparency and accountability: New technologies. http://www.transparency-initiative.org/reports/global-mapping-of-technology-for-transparency-and-accountability

[37] Technology for Transparency Network: http://transparency.globalvoicesonline.org/

[40] Omidyar Network: http://www.omidyar.com/

[41] WCIT Leaks (project) http://wcitleaks.org/

[42] Mcgee, R., & Gaventa, J. (2010). Review of the Impact and Effectiveness of Transparency and Accountability Initiatives: Synthesis Report. http://www.dfid.gov.uk/R4D/Output/187208/Default.aspx. See also http://www.dfid.gov.uk/R4D/Search/SearchResults.aspx?ProjectID=60827 for other outputs of the research programme this report is taken from.

Open Data, Land, Gender

[Summary: very rough and speculative notes in response to a land coalition online dialogue]

The land coalition are hosting a online dialogue until 20th Feb looking at “using online platforms to increase access to open data and share best practices of monitoring women’s land rights”. It’s an interesting topic for a dialogue particularly given one of the most widely cited cases used to highlight potential downsides of open data relates to the digitisation of land records and their exploitation to the detriment of poor landholders. However, as platforms like the LandMatrix (aggregating together land investment reports from research and advocacy groups across the world), and Open Development Cambodia demonstrate, open data is also being used by citizens to monitor land rights issues.
In this post I share a few quick thoughts on the broad theme of open data, land and gender.

Open data and land

The dialogue asks about how online platforms are contributing to the opening of land data. There are three broad sources of data I can see:

Official data – where governments have well managed land ownership databases then as part of national open government data programmes citizens may be able to secure the ongoing publication of this data in open forms. In the United Kingdom we’ve recently seen the Land Registry place data online, detailing land sale transactions in CSV and linked data; and a publicly owned land is a commonly featured dataset on local open data portals in the UK. However, this data itself may be tricky to use directly, and intermediaries are needed to make it accessible. In Kirklees, the Who Owns My Neighbourhood presents an interesting approach to using official data, and combining it with social features for citizens to input local knowledge and news about publicly owned plots of land: making official land data more ‘social’.

Crowdsourced data – in many cases there may not be an official source for the data activists want, or there may be limited prospect of getting access to the official data. Here a range of ‘crowdsourcing’ approaches exist. The LandMatrix approach uses researchers, and works to verify reports before sharing them. There may be other approaches available that use tools like pybossa to crowdsource extraction of structured information from semi-structured documents, or to split analysis of records into micro-tasks. The Open Street Map platform may also be able to act as source of data, allowing tags to be applied to land. Tools like CrowdMap (based on the Ushahidi platform) make it possible to collate reports submitted on a range of platforms including phone, and to verify reports, although the challenge with any crowdmap project is recruiting people to submit data.

Inferred data – at one of the RHOK Hack Days I took part in at Southampton I was interested to hear about a groups project using satellite data to work out crop types on plots of land. I suspect there are ways this data could be used to detect changes in land use that might indicate also changes in ownership – and the conversion of land from multiple crops to large agribusiness.

Using land data

Having open data on land ownership and land rights is only one part of the story. As the Bhoomi case illustrates, the regulatory framework around the data matters: is a dataset taken as authoritative, or are documents or other customary practices able to override the descriptions held in data? Does the data model through which land ownership and rights are described capture the subtlety and nuance of land use practices (see Srinivasan’s field note for a discussion of the need to mash-up multiple schemas of data to get a view of complex land practices)? And what intermediaries are active to help citizens mobilise land records to secure their rights, rather than those records being only truly accessible to private actors with technical and financial capital?

In the ongoing Land Coalition dialogue I’m interested to learn more about the cases of how data on land rights is being mobilised to create change: whether at the level of global advocacy, where big numbers may matter most; or at the level of individual struggles over ownership, access and rights, where detailed, accurate and timely data on particular plots is likely to be most important.

Open data and women’s land rights

I will admit to knowing very little about the specific issues around women’s land rights. However, in making the connection between open data and women’s land rights I did want to briefly explore whether a focus on digital platforms and open data introduces any particular gender issues. For example, whilst statistics on mobile phone penetration in developing countries suggest widespread access to mobile devices, there is a significant gender gap in mobile ownership and access, with women much less likely to have control of a handset than men. Gender issues may also arise in relation to the culture and practices around open data.

In a recent First Monday article, Joseph Reagle suggests that the ‘free culture’ movement associated with open source software and open knowledge products like Wikipedia possess a gender gap that is potentially event greater than the very gender unequal general computing culture from which it arose. Reagle argues that the ideas of ‘openness’ current in these communities can be used to dismiss concerns about gender gaps, and paint them as an issue of choice, rather than highlighting the wider structural factors that lead to the massive underrepresentation of women in online free software and open knowledge construction. For example, Reagle points to the “double shift” of women’s time, and the ways in which the ‘free time’ used to contribute to creation of open culture, whether through evenings away from work, or hack-days and other events, is unequally distributed between women and men.

Does this critique carry across the open data? It is apparent that the open data field is far from gender equal – at least in terms of advocates for open data, and the creators of tools, platforms and analysis built upon data – although whether it is male dominated to the extent that other fields such as open source contribution are is yet to be measured. In part any gender imbalance may be attributed to the connections between the open data community and the open source and free culture communities, which are already have a significant gender imbalance. However, we should also be open to deeper issues of epistemology: whether the very notion of resolving questions of ownership or fact through datasets, rather than through processes of dialogue, is itself gendered. How far advocacy to open up datasets moves into advocacy for the primacy of data over other ways of knowing, and how data is used and interpreted, has a bearing on whether gendered systems of power are being reinforced or challenged.

An ongoing discussion…

The above remarks are just some first thoughts on the topic. The Land Portal dialogue is running for another week, and I’m looking forward to diving spending time looking at what others are saying to better understand how open data and land can connect in constructive and positive ways.

I hope we might also develop some lines of the gender discussion more in upcoming work of the Open Data in Developing Countries project.

Open Data in Developing Countries


The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy