There are various enhancements to HDX that we can consider to improve the user experience, simplify the quality assurance work of Data Partnerships and support the development of dashboards and other visualisations. I want to document these enhancements here so we can think about if and how they might fit into our development plans for HDX.

Problems

Issues that were already identified 

The following issues have been identified:

How data is currently structured

In order to determine the best way to structure data going forwards, it is important to look at how data is currently structured. This is typically dependent upon how the organisation chooses to disaggregate its data.

  1. Dataset containing data in xlsx and csv formats as separate resources eg. https://data.humdata.org/dataset/afghan-voluntary-repatriation

  2. Dataset with rolling updates of resource (ie. dataset end date should be DATE) eg. https://data.humdata.org/dataset/inso-key-data-dashboardhttps://data.humdata.org/dataset/indonesia-monthly-humanitarian-update

  3. Dataset with metadata in resource eg. https://data.humdata.org/dataset/global-airportshttps://data.humdata.org/dataset/drc-health-data (jpeg has graphical metadata)

  4. Dataset with tiff in a zip: https://data.humdata.org/dataset/malawi_national_vulnerability_index_2015 (note the 2015 in the url is incorrect as it is current)

  5. Dataset with pdfs, zips (on OneDrive and filestore), mbtiles, tiff : https://data.humdata.org/dataset/iom-npm-cox-bazar-uav-imagery

  6. Dataset with JSON feed, HXLated JSON feed and xlsx (from automated output): https://data.humdata.org/dataset/migrant-deaths-by-month

  7. Disaggregate by country into datasets and by indicator into resources eg. https://data.humdata.org/dataset/who-data-for-barbados

  8. Disaggregate by date into datasets  eg. https://data.humdata.org/dataset/syria-idp-flow-and-returnee-data-october-2018https://data.humdata.org/dataset/syria-idp-flow-and-returnee-data-september-2018

  9. Disaggregate by date into resources within one dataset eg. https://data.humdata.org/dataset/nigeria-humanitarian-needs-overview

  10. Disaggregate by indicator into datasets eg. https://data.humdata.org/dataset/gender-development-index-female-to-male-ratio-of-hdihttps://data.humdata.org/dataset/population-in-severe-poverty-headcount

  11. Disaggregate by country into datasets and by date and region into resources eg. https://data.humdata.org/dataset/drc-displacement-data-baseline-assessment-iom-dtm

  12. Disaggregate by country into datasets and by round into resources eg. https://data.humdata.org/dataset/nigeria-baseline-data-iom-dtm

  13. Disaggregate by country and emergency into datasets and by round into resources eg. https://data.humdata.org/dataset/indonesia-displacement-data-sulawesi-earthquake-site-assessment-iom-dtm

  14. Map data for a country at different admin levels for various dates eg. https://data.humdata.org/dataset/administrative-boundaries-of-bangladesh-as-of-2015 (note the 2015 in the url is incorrect as it is current)

  15. Map and population data for a country with varying file formats and metadata in a pdf eg. https://data.humdata.org/dataset/bhutan-administrative-level-0-1-population-statistics

Simon is looking at how to identify data series.

Are we approaching the stage where we need to break down data by admin 1 rather than country to enable users of HDX to be able to search for data in the UI at that level? How do we make data available in many forms eg. by country, by indicator, by admin 1?

Ideas for HDX

The ideas presented below were created with consideration for what is feasible given the restrictions of CKAN. The intention was to avoid overly complex ideas that might require forking CKAN to make fundamental changes to its architecture and instead to try to come up with something relatively simple to implement given limited development capacity.

Tags metadata

Detecting Breaking Changes to Resources

Users Registering Interest in Datasets

Keeping a History of Data

Improving Search

Standardising and Categorising Resources

Handling Different Structures of data 

We want to move contributors away from a completely freeform experience of structuring data without discouraging them. I need to see how this fits with SImon’s ongoign work on data series:

Fixing URLs

Archiving Resources

Excessively Large Resources

Centre Curated Resources

The Centre may wish to create new curated version(s) of an existing resource(s) in a dataset:

Discouraging Dated datasets

We want to discourage dated datasets as this leads to new datasets being created for each new update and inconsistent urls

Blue Sky Ideas

The ideas below would require technological leaps and involve creating new systems outside of CKAN (athat could still link back into CKAN). They require research and discussion to flesh out.

Meta Service/API that links to other APIs

The idea is to have a meta service/API that would link to other services/APIs and allow the easy download of data given a standard set of input parameters. On top of such a meta service/API would be a user interface which would allow setting of those parameters and download by non technical users. Using the "wheat price kandahar" example, I could imagine a meta service/API user choosing parameters like "service": "prices", "type": "commodities", "provider": "WFP", "country": "Afghanistan", "adm1": "Kandahar", "Commodity": "wheat", "date": "08/03/2022".

Could this power the HDX UI allowing searches like "wheat price kandahar" to produce helpful results?

Add meta service/API resources as “queryable” resources in HDX - ones where you can add some parameters to filter or transform the returned file:

Data Cube

Typically we make data available in different forms like by indicator or by country by making separate datasets with their own data. This makes accessing the data fast but complicates scrapers which must do the breakdowns so mostly data is just broken down by country on HDX. A data cube enables data to be modelled and viewed in multiple dimensions. Is there a way that data could be stored in some sort of data cube so that it can be viewed in different ways like by country, admin 1 or indicator without keeping multiple copies of the data?