A whole lot of development datasets

Sharing some work in progress; and a small collection of International development datasets

What happens when you set two researchers to work looking for openly available datasets with some connection to the very broad field of International Development?

Well, for one thing you get a very large spreadsheet of 91 different datasets.

Initially I thought it might be possible to record details of different datasets against a short list of themes at categories, but the researchers quickly broke out of the pre-defined list, and my attempts to try and clean up the data from a top-down view in Google Refine, or to visualise the different categories by generating a graphviz file quickly ran up against the complex and messy reality of international development data. So, after many attempts to get the big-picture to fit in a small laptop screen I gave up and turned to my favourite approach of making little paper playing cards.

You can find the spreadsheet of datasets the researchers located here (CSV), and a PDF of these datasets ready to be cut out as little playing cards here.

The cards only display some of the information recorded on each dataset – and, as I found the way way to understand a dataset was to visit it’s website, each card has a QR code on. Using a little QReader application I’ve been able to find a fairly good working process, sifting through the cards and, whenever I want to find out more about the dataset described on one – holding it up to my web-cam to quickly get the relevant website on screen. Surprisingly a lot easier and a more natural process than having to turn to the keyboard, looking up web addresses and typing them in (note to self: explore different ways to use QR codes to organise future research work…).

Working through the cards I started to get a very different perspective on international development data from that my preconceptions might have suggested. Whilst Keisha Taylor had the brief of looking globally at datasets which could broadly be said to be relevant to international development, Rui Correia was focussed on datasets relevant to South Africa, and starting with Rui’s datasets was instructive. Asides from the big datasets from the World Bank and from UN agencies and institutions, well known in open data circles, there are a myriad of national and local projects collecting and publishing data on all sorts of issues. A whole network of sites share research data on biodiversity; the Africa-wide FINSCOPE project (originally funded by DFID) holds detailed data on financial readiness in different contexts, primarily selling the data to finance institutions, but also providing data-rich PDF reports for free; Universities run data-archives along national boundaries (as is the case for most state-funded archives across the world); smaller sites contain lists of available data, but with an e-mail address or form to request it rather than a download option (on websites that are most probably still built with desktop apps rather than Content Management Systems). Rui also included a number of sites in mapping that I, on a initial read, was about to disregarded as ‘not datasets’ – consisting instead of news websites with loose directories and listings of organisations. Compared to sites like the World Association of NGOs directory, or Kabissa, that hold structured information on organisations and projects, these sites may not appear to belong in a collection of datasets, but if our concern is about development information, and how having this information as data helps it to be shared, then such sites should be within our mapping of the potential development data environment.

I’m only 1/3 way through exploring all the cards and datasets for now – and how much analysis I’ll be able to complete for the first draft of the working paper this research will be feeding into I’m not sure. I’ll try and share some more reflections as I explore more datasets – but for now I just wanted to get a post up sharing, in the spirit of open data, the mapping spreadsheet so far…

I’ve been thinking from the start of this research that we should make sure the datasets we find get listed on CKAN.net. That’s still part of the plan – but if anyone wanted to get started on that they would be most welcome as it could take a while…


One Comment

  1. Rolf Kleef

    Hi Tim, interesting to have a look at these data sets. My first reaction also was: they should be listed on CKAN.net. Perhaps I’ll try that during the Hack Day at the Open Aid Conference in Berlin, end of this month, to check out the CKAN API and import library (http://docs.ckan.org/en/latest/loading_data.html).

Leave a Reply

Open Data in Developing Countries

The focus of my work is currently on the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) project with the Web Foundation.

MSc – Open Data & Democracy