The HDX Python Library is designed to enable you to easily develop code that interacts with the Humanitarian Data Exchange platform which is built on top of the CKAN open-source data management system. The major goal of the library is to make pushing and pulling data from HDX as simple as possible for the end user. There are several ways this is achieved. It provides a simple interface that communicates with HDX using the CKAN Python API, a thin wrapper around the CKAN JSON API. The HDX objects, such as datasets and resources, are represented by Python classes. This should make the learning curve gentle and enable users to quickly get started with using HDX programmatically.
You can jump to the Getting Started page or continue reading below about the purpose and design philosophy of the library.
Datasets, resources and showcases can use dictionary methods like square brackets to handle metadata which feels natural. (The HDXObject class extends UserDict.) eg.
dataset['name'] = 'My Dataset'
Static metadata can be imported from a YAML file, recommended for being very human readable, or a JSON file eg.
dataset.update_yaml([path])
Static metadata can be passed in as a dictionary on initialisation of a dataset, resource or showcase eg.
dataset = Dataset({
'name': slugified_name,
'title': title,
})
There are functions to help with adding more complicated types like dates and date ranges, locations etc. eg.
dataset.set_date_of_dataset('START DATE', 'END DATE')
There are separate country code and utility libraries that provide functions to handle converting between country codes, dictionary merging, loading multiple YAML or JSON files and a few other helpful tasks eg.
Country.get_iso3_country_code_fuzzy('Czech Rep.')
console:
class: logging.StreamHandler
level: DEBUG
formatter: color
stream: ext://sys.stdout
error_file_handler:
class: logging.FileHandler
level: ERROR
formatter: simple
filename: errors.log
encoding: utf8
mode: w
If using the default logging configuration, then it is possible to also add the default email (SMTP) handler:
error_mail_handler:
class: logging.handlers.SMTPHandler
level: CRITICAL
formatter: simple
mailhost: localhost
fromaddr: noreply@localhost
Configuration is made as simple as possible with a Configuration class that handles the HDX API key and the merging of configurations from multiple YAML or JSON files or dictionaries:
class Configuration(UserDict):
"""Configuration for HDX
Args:
**kwargs: See below
hdx_key_file (Optional[str]): Path to HDX key file. Defaults to ~/.hdxkey.
hdx_config_dict (dict): HDX configuration dictionary OR
hdx_config_json (str): Path to JSON HDX configuration OR
hdx_config_yaml (str): Path to YAML HDX configuration. Defaults to library's internal hdx_configuration.yml.
project_config_dict (dict): Project configuration dictionary OR
project_config_json (str): Path to JSON Project configuration OR
project_config_yaml (str): Path to YAML Project configuration. Defaults to config/project_configuration.yml.
"""
The library itself uses logging at appropriate levels to ensure that it is clear what operation are being performed eg.
WARNING - 2016-06-07 11:08:04 - hdx.data.dataset - Dataset exists. Updating acled-conflict-data-for-africa-realtime-2016
The library makes errors plain by throwing exceptions rather than returning a False or None (except where that would be more appropriate) eg.
hdx.configuration.ConfigurationError: More than one project configuration file given!
from hdx.facades.scraperwiki import facade
def main():
dataset = generate_dataset(datetime.now())
...
if __name__ == '__main__':
facade(main)
The code is very well documented. Detailed API documentation (generated from Google style docstrings using Sphinx) is available and mentioned in the Getting Started guide.
def load_from_hdx(self, id_or_name: str) -> bool:
"""Loads the dataset given by either id or name from HDX
Args:
id_or_name (str): Either id or name of dataset
Returns:
bool: True if loaded, False if not
"""
IDEs can take advantage of the documentation eg.
def merge_dictionaries(dicts: List[dict]) -> dict:
gives:
def update_yaml(self, path: Optional[str] = join('config', 'hdx_dataset_static.yml')) -> None: