Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hence finding a way to have fixed data URLs is critical to a number of possible future goals. This was previously referred to as stable API alluded to in the team chat, but fixed data URLs better reflects that URLs may just point to files eg. a csv rather than an API endpoint. Here is an example: https://data.humdata.org/dataset/6a60da4e-253f-474f-8683-7c9ed9a20bf9/resource/45dc4269-405a-433d-9011-d1ae23d624a5/download/fts_requirements_funding_cluster_afg.csv

Fixing data URLs may sound simple but requires delving into or at least considering a number of issues. Fortunately many of them have appeared in the brainstorming ideas Trello. Now we have a common thread that ties them together.

  • we need to be able to distinguish data resources from auxiliary ones - the joint top brainstorming idea from the last meeting was to do that and show it in the UI
  • resources can't keep growing indefinitely - we need a way to archive non-current data (different to dataset archiving which is in freshness)
  • newly added data may contain errors so it may be helpful to be able to fall back to a previous version of the data eg. if a dashboard cannot load latest/xxx.csv, it could try 1/xxx.csv (versioning brainstorming idea)
  • finding data by API needs to be simpler. Currently the limitation there is the capabilities of the CKAN API search. It can be helped by adding more metadata into the dataset for example the list of fields and HXL tags in the data (as in another brainstorming idea)
  • a system whereby automated users (and maybe normal users as well) can register to receive important information about a dataset they are using eg. a breaking change to the format, no longer being updated etc.
  • a workflow that tries to alert a contributor when an update to a resource they are making has unexpected field names, data type changes, etc.