Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Use batch_id as a first cut at grouping data series, though this should be tested to be sure that datasets on different themes that are being produced by the same script are not grouped into the same data series.

  • Add additional words to the “country name words” list to be stripped out of the dataset names. A quick browse of the results from above shows that sub-national location identifiers like “center” or “north1” are generating additional data series (mainly for the HOT datasets).

  • Including both dataset name and dataset title in the analysis might reduce false positives marginally.

View file
nameHDX - Data Series Counting.ipynb