Measuring the quality of humanitarian data: An emerging framework

The goal of HDX is to make humanitarian data easy to find and use for analysis. If OCHA had decided to stop at just making data available, it wouldn’t need a statistician. But making data useful for analysis requires a lot of behind-the-scenes hard work.

As the Statistician in charge of the analytical aspects of this great project, my main objective is to ensure we provide humanitarians with a high-quality data service. The first step has been to define quality in the context of humanitarian action.

The most commonly accepted way of defining data quality is in terms of the broad notion of fitness for purpose. A quality assessment should be undertaken along multiple dimensions that include the relevance of the data combined with basic characteristics. These include accuracy, timeliness, accessibility, interpretability and comparability.

Here are more details on the dimensions of quality as they relate to humanitarian data:

Relevance is determined by whether the data meets the needs of its users. We define relevance as the Common Humanitarian Dataset — a set of indicators that can provide an analytic lens into a specific crisis and across multiple crises. This data set can and should be adapted based on user feedback.

Accuracy is the degree to which the data correctly describes the phenomenon it was designed to measure. It is sometimes understood as the margin of error or bias in the estimates. The HDX platform will enable data providers to add caveats to their data so that known errors can be detected and quantified.

Timeliness is the delay between when the data is collected and when it becomes available. In humanitarian operations, timeliness is crucial and often the most important characteristic of the data.

Accessibility refers to the ease with which data can be obtained or shared. HDX is creating a neutral space where partners can share data—through manual upload or automated processes—for others to use and re-use under specific data licenses.

Interpretability is the availability of supplementary information that helps analysts understand and interpret the data effectively. We are working hard to provide users with comprehensive metadata for all data sets shared through HDX.

Comparability refers to the degree to which data can be combined with other data to undertake analysis. Our team is working with partners to develop an initial set of standards for combining second-mile data, i.e. data that is shared by data collectors with data aggregators. Find out more about the Humanitarian Exchange Language.

The following visual shows a five-axis radar chart with a potential mix of data-quality dimensions.

 data_quality_dimensions

It is important to note that this quality assurance framework is situated in the context of a larger effort that would encompass best practices in data collection and statistical analysis. Data quality is maintained when there is less human manipulation of the data—the more automated the data sharing (through APIs, data standards, etc.), the less room there is for errors and bias.

What do you think about this emerging framework? Please send your comments using the form below or by sending an email to hdx@un.org. You can also read more about the HDX quality assurance framework here.

Also, I am always looking for volunteers to support the HDX project. Please get in touch if you’d like to offer your analytical skills. A sincere thank-you to volunteer data scientists Andrew Rosenfeld, Dominik Kalisch and Kevin Lynch who have already made an important contribution to our work.