Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tabulator-py is already in use in the HDX Utilities library and through that in the HDX Python API for uploading to the HDX datastore and also in the Chatham House project.

HXL Proxy

Tabulator-py could also be used in the HXL Proxy to replace the stream reading code, the advantage being advantages including the consequent reduction in the amount of code to be maintained and that improvements coded by others in this package will automatically be available to the HXL Proxy (for example support for zipped csv). The main disadvantage is disadvantages are the time needed to refactor the HXL Proxy to use it and to identify any missing features needed.

...

Import for Google Spreadsheets could be used to enable organisations to easily move from local Excel spreadsheets to Google Spreadsheets in which we can embed a trigger to determine if the data has changed for freshness purposes.

HDX UI

datapackage-js could be used to enable the export of HDX datasets as Frictionless data packages should the standard take off.

Data Check

goodtables-py, Data Curator and Stenci.la (looks like a cross between Word and Pandas) could provide code and ideas for this tool. This would be the most significant use of of the areas presented here for where Frictionless could be used in HDX and shapes how much effort should be put into further prototyping. The decision that needs to be made is whether to make improvements to the HXL proxy or to use and contribute to Frictionless libraries to make them either least minimally HXL-aware if used alongside the HXL Proxy or fully aware if used as a replacement.

Advantages to using HXL Proxy

  • Familiarity
  • In house knowledge
  • Speed to get going

Advantages to using Frictionless

  • Less in house code to maintain
  • More contributors to the codebase
  • Access to new tools/libraries

The HXL Proxy already has some validation capabilities built in so the question is whether what it has is already sufficient and if not, whether Frictionless offers significantly more such that it is worth the effort to switch rather than just build on the HXL Proxy. Using a minimally HXL-aware Frictionless for the non HXL-specific validation such as checking types with the HXL Proxy for validation specific to HXL eg. against vocabularies may be a good solution.

HDX to Frictionless Prototype

A prototype has been developed that takes an HDX dataset name and from it produces a Frictionless datapackage. Initially it was designed to convert from HXL to Frictionless but it became apparent that the data package can include metadata unavailable to HXLated resources - metadata contained within the HDX dataset itself.