/
Deletion of Datasets on HDX

Deletion of Datasets on HDX

There are 3 ways to "delete" a dataset on HDX. 

Method 1:  Delete a dataset via the HDX interface (recommended)

This method is available to anyone who is an editor or administrator of the organization that contributed the dataset.  This method purges the dataset's metadata from the HDX database.  Any resources uploaded to HDX are also deleted.  The dataset cannot be retrieved by users, however it will continue to exist in backed ups of the HDX database which are maintained by HDX for XX months.  In the back end, this method used the hdx_package_delete API call.

Method 2: Delete a dataset via the hdx_package_delete API call (recommended)

HDX has added an additional API call to the CKAN API, hdx_package_delete which purges both the dataset metadata and any uploaded files in the filestore.  

Method 3: Delete a dataset via the default CKAN API calls (not recommended)

HDX is built on the CKAN framework which has it's own API.  There are two API calls in the CKAN API which delete all or parts of a dataset:

  1. The CKAN API call package_delete flags the dataset and the attached resources as deleted (state = deleted), but does not remove them from the HDX database.  This is the CKAN API call currently used by the HDX Python Library when the delete_from_hdx method is called, however, this will be changed to use the hdx_package_delete CKAN API call.
  2. The CKAN API call package_purge deletes the dataset record from the database, but any uploaded resources (filestore) are not removed.  They become orphaned files on the HDX server and might be removed at some point in the future as part of a clean up process.

A note about backups

Even fully deleted (purged) datasets may continue to live on in periodic backups which are taken of the HDX database.   How long, they continue to live on in the backup depends on how long the dataset was on HDX before deletion.  A dataset that was on HDX for less than a day before being deleted might never be captured in a backup at all, but if the day was January 1, it might well be captured in an annual backup that could be kept for many years.