Summary: this article describes the per-country datasets on HDX derived from the UNHCR Population Statistics API, via the HXL Proxy.
Source data overview
On its population statistics site, UNHCR publishes six global datasets (mostly at the annual and country level):
- Persons of concern (movements of people from country to country)
- Time series (Persons of concern reformatted as wide data)
- Demographics (sex-and-age-disaggregated data about Persons of concern)
- Asylum seekers (status of asylum seekers by country of origin, location, and year)
- Asylum seekers (monthly) (raw number of asylum seekers by country of origin, location, and month)
- Resettlement (total number of refugees resettled, with or without UNHCR assistance, by country of origin, location, and year)
Each of these is downloadable as a (large) HXL-tagged dataset, including all available countries and years. Example: http://popstats.unhcr.org/en/persons_of_concern.hxl (103,000+ rows). UNHCR has approved this method of publishing their data.
HDX shares these datasets as live data from the UNHCR popstats site, filtered through the HXL Proxy to produce two views for each country:
The HXL Proxy filters the datasets from UNHCR on demand (with each download request, cached for an hour), and the download links on HDX are direct calls to the Proxy, such as the following:
(You can also view the data recipe on HDX.) Since the data comes directly from UNHCR on each request, there is rarely any need to update the dataset definitions in HDX.
Generating the datasets
The Python3 script used to generate or update the datasets is on HDX's GitHub repository:
Before running the script, you will need to copy the
config.py.TEMPLATE file to
config.py and fill in the appropriate values.
Note that the script also depends on a Google Sheet containing a mapping table between UNHCR country names and HDX (ISO3) country codes. The sheet is publicly readable and available at