Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Important fields

frequency of updates => it will indicate how often the data is expected to change 
Last_modified => it will indicate the last time the dataset (resource) was changed, it is not only to monitor new data but also minor updates 
date of dataset => date to which data refers to. It has to change when new data comes to hdx but it does have to change for minor updates 

Thoughts

There are two aspects of data freshness:
 
1. Date of update: The last time the data was was looked at to confirm it is up to date ie. it must be examined according to the update frequency
2. Date of data: The actual date of the data - an update could consist of just confirming that the data has not changed
We should send an automated mail reminder to data contributors if the update frequency time window is missed by a certain amount. Perhaps we should give the option for contributors to respond directly to that mail to say that data is unchanged so they don't even need to log into HDX in that case, otherwise provide the link to their dataset that needs updating.
The amount of datasets that are outside of HDX is growing. I think we should try to handle this situation now. The simple but perhaps annoying solution is to send a reminder to users according to the update frequency (irrespective of whether they have already updated as we cannot tell).
Another way to do so is to provide guidance to users so that as they consider how to upload resources, we steer them towards a particular technological solution that is helpful to us eg. Google spreadsheet with update trigger, document alerts in OneDrive for Business, macro in Excel spreadsheet. I don't know if this is possible, but complete automation would be if they could click something in HDX that creates a resource pointing to a spreadsheet in Google Drive with the trigger set up that opens automatically once they enter their Google credentials.

Number of Files Locally and Externally Hosted

TypeNumber of ResourcesPercentage
File Store                                  2,102
22%
CPS                                  2,459
26%
HXL Proxy                                  2,584
27%
ScraperWiki                                     162
2%
Others                                  2,261
24%
Total                                  9,568
100%

Actions


Update frequency needs to be mandatory: 
Investigate http get last modification date field - 60% in HDX have this according to UofV.

References

Using the Update Frequency Metadata Field and Last_update CKAN field to Manage Dataset Freshness on HDX:


https://docs.google.com/document/d/1g8hAwxZoqageggtJAdkTKwQIGHUDSajNfj85JkkTpEU/edit#


University of Vienna paper on methodologies for estimating next change time for a resource based on previous update history:
University of Vienna presentation of data freshness:
  • No labels