This document is intended as a collection of procedures and resources to guide the curation of Data Completeness instances (henceforth, Data Grids) which can be activated for any location page on HDX (by a sysadmin). This document and others linked from it, should evolve to capture best practices and any other useful info learned as the data grid curators do their work.

Once activated for a given location page, the Data Grid will appear and will be using a default recipe (based on tags) to fill the data grid. However, tags are seldom enough to accurately gauge if a dataset meets the requirements of a given data grid. Curation, then, is the process of customizing a specific location's data grid so that the datasets included in the data grid meet the defined requirements for the subcategory. That customization is done by editing the recipe yaml file (which is format that is friendly to both humans and machines).

Resources

Procedure document (this document)
Data Completeness Definitions Document
Quality Checklist (below)
YAML editing examples (below)
Github Repository
YAML Validator
Data grids overview dashboard

Process Overview

The basic curation process is outlined below:

HDX > Data Grid (Data Completeness) Curation Procedures > Curation Process Overview.png

Data Grid Instances to be Curated

There may be more on the feature server for testing purposes, but the ones listed below should be the only active ones on the production server.

Country	Production Data Grid	Feature Server Data Grid	Curator(s)	Last check date
Yemen	Production: yem	Feature: yem	Amadu	26 April 2019
Sudan	Production: sdn	Feature: sdn	Meti
Indonesia	Production: idn	Feature: idn	Faizal	26 April 2019
Somalia	Production: som	Feature: som	Meti	26 April 2019
Colombia	Production: col	Feature: col	Amadu
Philippines	Production: phl	Feature: phl	Amadu	26 April 2019
Afghanistan	Production: afg	Feature: afg	Meti
Bangladesh	Production: bgd	Feature: bgd	Faizal
Chad	Production: tcd	Feature: tcd	Nafi
Mozambique	Production: moz	Feature: moz	Obadah	26 April 2019
Venezuela	Production: ven	Feature: ven	Joseph
Democratic Repubic of the Congo	Production: cod	Feature: cod	Joseph
Central African Republic	Production: caf	Feature: caf	Nafi
Myanmar	Production: mmr	Feature: mmr	Obadah

Quality Checks Process

Each dataset that is a candidate for data grid has to be evaluated to determine if it fully meets the requirements to be included, partially meets the requirements, or does not meet them at all. The outcome determines what actions have to be taken in the YAML file to inlcude or exclude the file, and any comments to be recorded for users to understand where the dataset falls short. Below the process diagram, you will find more details on each quality check.

HDX > Data Grid (Data Completeness) Curation Procedures > Quality checks.png

Details on each quality check

Is the dataset subnational?

This one is straightforward. National level statistics are by definition excluded from data grid. The extent to which it needs to be subnational is handled further down this list.

How much of the required information specified in the definition does the resource contain?

Look at the sub-category definition and determine if the definition is met fully, partially, or not at all. If a lack of clear field names and/or a data dictionary make it hard to be sure, the dataset can be excluded or included as partially meeting the requirements.

Note: if t complete coverage can be obtained by combining several datasets (for example: several different 3Ws, one for each cluster with all clusters being covered), then all the datasets can be included but marked as "incomplete" with the same comment. The logic here is that someone should be combining these datasets.