Maintaining Tag Cleanup

As of 16-July-2020, the tag cleanup script (attached) still needs to be run periodically to catch datasets with incorrect tags that have been added to HDX. This can only happen with datasets added via API. There is an open ticket for fixing that, but in the meantime, the script should be run ideally about once per week.

The script attached is configured to be run locally on CJ’s machine, so there are some quirks, but there are a lot of comments in the code to help. If it has fallen to you, dear reader, to make this script run, apologies in advance; it was never meant to be used for so long or handed over to others.

As of this writing, the script runs without intervention (just open in Jupyter and run all). It writes to a log file for each dataset completed (or failed) and some summary reporting is provided in the console. Although many types of errors are handled (and reported), if the script errors out, the progress in the run is not lost. It is possible to point the script at the log file and it will skip all completed datasets.