Fixed Data URLs Idea

Rationale

If we want to get to QuickMaps and QuickDash and properly support QuickCharts, to support curation of data and if we want to increase the usage of the API to retrieve data as per the brief discussion in the team call, then I see the proposal set out below as being a necessary first stage and hence something we should consider for our roadmap.

Currently, users of data in our resources may find that the data stops being updated without warning because the dataset contributor wishes to update their data with the most current and does so by creating a new resource or dataset rather than adding to the existing resource in the dataset. The sad thing is that this means that some organisations' data cannot be used in automated things not because it is in some way bad but simply because of the way they are choosing to update it.

If we want more people to want to use direct URLs to retrieve data for use in automated reports, visuals and systems (as opposed to clicking the Download button), then they need URLs that will not need to be changed each time the contributor wants to add new data. This is also true for any dashboards, maps and charts we wish to to build and maintain. In fact it is not just automated processes that would benefit. It will be easier for people manually compiling regular reports not to have to search each time for the most current URL.

I'll give a use case: A HDX user in the future starts making a Quick Dashboard (or creating a manual monthly report) based on multiple datasets in HDX. The dashboard (or user) pulls data from the URLs of resources in those datasets. A contributor comes along to add current data to their dataset which happens to be one used in the dashboard (or report). Unaware that their dataset is being used by the dashboard (or user), the contributor decides to do one (or more) of these:

  1. add the most current data in a new dataset
  2. add the most current data in a new resource within the dataset
  3. delete the dataset
  4. change the format of the resource
  5. change the types of columns in the resource

1 and 2 will mean that the user’s dashboard (or report) will be showing old data without warning them that this is the case. 3, 4 and 5 will cause some sort of failure in the dashboard (or errors the next time the user manually tries to create the report from the same urls).

What Needs to be Investigated

Hence finding a way to have fixed data URLs is critical to a number of possible future goals. This was previously referred to as stable API alluded to in the team chat, but fixed data URLs better reflects that URLs may just point to files eg. a csv rather than an API endpoint. Here is an example: https://data.humdata.org/dataset/6a60da4e-253f-474f-8683-7c9ed9a20bf9/resource/45dc4269-405a-433d-9011-d1ae23d624a5/download/fts_requirements_funding_cluster_afg.csv