Deletion of Datasets on HDX
There are 3 ways to "delete" a dataset on HDX.
Method 1: Delete a dataset via the HDX interface (recommended)
This method is available to anyone who is an editor or administrator of the organization that contributed the dataset. This method purges the dataset's metadata from the HDX database. Any resources uploaded to HDX are also deleted. The dataset cannot be retrieved by users, however it will continue to exist in backed ups of the HDX database which are maintained by HDX for XX months. In the back end, this method used the hdx_package_delete
API call.
Method 2: Delete a dataset via the hdx_package_delete API call (recommended)
HDX has added an additional API call to the CKAN API, hdx_package_delete
which purges both the dataset metadata and any uploaded files in the filestore.
Method 3: Delete a dataset via the default CKAN API calls (not recommended)
HDX is built on the CKAN framework which has it's own API. There are two API calls in the CKAN API which delete all or parts of a dataset:
- The CKAN API call
package_delete
flags the dataset and the attached resources as deleted (state = deleted
), but does not remove them from the HDX database. This is the CKAN API call currently used by the HDX Python Library when thedelete_from_hdx
method is called, however, this will be changed to use thehdx_package_delete
CKAN API call. - The CKAN API call
package_purge
deletes the dataset record from the database, but any uploaded resources (filestore) are not removed. They become orphaned files on the HDX server and might be removed at some point in the future as part of a clean up process.
A note about backups
Even fully deleted (purged) datasets may continue to live on in periodic backups which are taken of the HDX database. How long, they continue to live on in the backup depends on how long the dataset was on HDX before deletion. A dataset that was on HDX for less than a day before being deleted might never be captured in a backup at all, but if the day was January 1, it might well be captured in an annual backup that could be kept for many years.