Mapping open development data – a draft method & micro-study

As part of the Linked Open Data for Development working paper I’m working on for IKM I’m hoping to include a small mapping study sampling some of the open data datasets, and linked datasets that might be relevant for actors in the development sector. The plan is to look at two collections of data: global data, and a sample of data relevant to a particular locality.

Often, in open data circles, talk of ‘development data’ becomes talk of ‘aid data’ fairly quickly. But the datasets relevant to pro-poor positive development are far broader than that. In fact, they can encompass a lot of the data eco-system of a given country, government or NGO sector. Which makes carrying out a 2 – 3 day mapping study of what data might be available an impossible task. But, we can at least make a start. So – below is the draft ‘terms of reference’ and method for a small bit of mapping for comments. From the end of next week I’ll be looking for someone who could carry out some of the mapping research. So:

  • Comments welcome on the method. I know other organizations are also interested in this sort of mapping; so if this process can contribute to other efforts all the better.
  • Pointers to any existing mapping we could draw on (other than the current CKAN Development Data group) welcome;
  • Expressing of interest in helping with the mapping work (a small bit of funding for a few days work is available) welcome.
  • Other comments and feedback also invited…

Mapping Linked Open Data in Development

This short mapping study aims to identify a range of open and linked datasets with relevance to International Development in two sets:

  • Global data: identifying datasets that could play a role in pro-poor International development covering international interactions, multi-national activities, or covering a large number of countries including developing countries.
  • Local data: exploring a sample of datasets relevant to one particular locality. Asking (a) which of the global datasets are relevant to the locality; (b) what additional local datasets available.We are yet to decide on the specific locality – but expect to look at either an Indian state, or a country/region in East Africa.

It further aims to identify current, and potential linkages between these different datasets.

Defining Development:

International development encompasses foreign aid, governance, healthcare, education, poverty reduction, gender equality, disaster preparedness, infrastructure, economics, human rights, environment and issues associated with these. It focuses on long-term solutions. (http://en.wikipedia.org/wiki/International_development)

The Millennium Development Goals highlight key areas of development work.

We are interested in datasets that:

  • Are accessible digitally in some form;
  • Have significant coverage of one or more of the development issues listed above;
  • Could be of use to actors involved in development: from grassroots communities, to regional and national policy makers, to international institutions.

Development relevant datasets might also include geographical datasets (maps; land surveys etc.), and infrastructural datasets (e.g. code-lists, categories etc.).

We are interested in assessing which of the datasets meeting the criteria above are:

  • Legally, technically and practically open;
  • Available as linked data;

We are also interested in looking for linkages between these datasets.

Method

Discovering datasets

As comprehensive a list of datasets as possible should be constructed through desk research and/or a survey of stakeholders.

This should seek to identify:

  • The name of a dataset;
  • Where it is available;
  • Who is responsible for the data;
  • The format it is available in;
  • The topics it covers;
  • The geographical areas it covers;
  • When it was last updated and how often it is updated;
  • Whether or not the data is under an open license;
  • Whether or not the data is available as linked data;

This information should be recorded in a Google spreadsheet (to be set up) and will later be imported into the CKAN database.

Two days each for global and local dataset mapping will be available. This inevitably will only allow us to gain a partial picture of the available data.

Mapping connections

Once a list of datasets is compiled we should:

  • Identify datasets that connect together in some way – either:
    • Those that cover the same categories, but don’t share codes (e.g. organized by country, but using different sorts of identifiers)
    • Those that contain shared identifiers (e.g. country codes);
    • Those that are available as linked data with links between them;
    • Identify datasets which could, if available as open and/or linked data support the development of a stronger linked open development data eco-system

We will explore using CKAN and/or converting the spreadsheet to a linked dataset where connecting properties can be articulated in order to demostrate connections to perform this second part of mapping.

Categorizing datasets

We will identify some useful categories to use for coverage and topics.

Countries/regions covered[1]

  • Use FAO for countries?
  • Use geonames for regions?

Topics

We will use the list of categories from Wikipedia page (not authoritative, but the most comprehensive list I can find): foreign aid, governance, healthcare, education, poverty reduction, gender equality, disaster preparedness, infrastructure, economics, human rights, environment.

 

 


[1] The FAO ontology provides us with country development levels, and cross-references to DBPedia. We can match between FAO and GeoNames using ISO codes (potentially).

 

Leave a Reply

Open Data in Developing Countries


The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy

RSS Recent Publications

  • An error has occurred, which probably means the feed is down. Try again later.