Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Humanitarian Data Exchange (HDX) is adding a new mandatory metadata field called Expected Update Frequency. It replaces a previous optional field, Update Frequency, and its purpose is to tell us how often datasets shared through the site are likely to be updated.

...

We are introducing a new set of features to HDX based on the concept of "Data Freshness". We are interested in assessing how current is the data within each dataset because we want the portal to become more useful for consumers of the data: one important metric we can give them is how up to date are the datasets. Imagine a large walk in freezer in a restaurant. Delivery staff fill it with new products akin to how contributors add new datasets to HDX. Cooks look inside for items they need and mix them in various tasty ways. Analogously, users find datasets in HDX and combine the data for analysis. Foodstuffs can be safely stored in the freezer for different periods of time. If noone no one checks, the caterers may use stale ingredients, so there needs to be a method to keep track of the contents and if anything is too old to order replacements. Given the choice, chefs would like to use the freshest produce, and similarly we want users have access to the most up to date data in HDX, particularly since it holds over 4000 datasets. We want to help data providers oversee their data, particularly where update processes are manual, and make it easy for people to find data that is actively maintained. 

...

We are drawing on research being done on data freshness at Vienna University. Specifically, the researchers are looking at estimating the next change time for a resource based on previous update history and applying a Markov chain approach. The research is still ongoing but we hope to learn from their results to enhance HDX.

...

Let us know what you think of this approach. Send feedback to hdx@un.orgWatch this space! There will be more coming on the subject of Data Freshness.