Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

Topline numbers are overview statistics about an entity. They are a way of summarising a few select important facts and presenting them in a visually appealing way on HDX. They can only be set up to appear on organisation or location pages at this time.

HDX

Within HDX an organisation or location may only have one set of topline figures which will appear on its front page eg.


https://data.humdata.org/organization/ocha-afghanistan

At the top of the above organisation page are various statistics like “Conflict induced IDPs” and “Disease outbreaks”.


https://data.humdata.org/group/bwa

On the top left of this location are some numbers such as “Population, total” and “GDP per capita, PPP”.


Topline numbers for organisations come from data held in the datastore, whereas for locations the data comes from CPS. They can be read through the HDX api eg. https://data.humdata.org/api/action/hdx_topline_num_for_group?id=yem


To set up a topline like requires that there be a dataset (by convention named something like “OCHA Afghanistan Topline Figures”) containing the appropriate data in the right format. An example of such a dataset can be found here:  https://data.humdata.org/dataset/ocha-afghanistan-topline-figures.

Specifically the resource that is a csv entitled topline_figures contains the fields that are used.


To add a topline to an existing organisation, you must go to the appropriate section in the organisation’s setup. To do that:

  1. You edit the organisation (Admin -> Edit in GUI or direct link eg. https://data.humdata.org/organization/edit/ocha-afghanistan)

  2. Select “Use custom organization page”

  3. You scroll down to “TOPLINE NUMBERS”

  4. The “Resource ID” can be found from previewing the correct resource (eg. the csv) and taking the text after “...resource/” in the url eg. “ca6a0891-8395-4d58-9168-6c44e17e0193”


This csv containing the topline data eg. topline_figures must be pushed into the datastore:

  1. Go to this website: http://www.hdxdatateam.xyz/

  2. Login with the usual username and password

  3. If you do not have an account, create one

  4. Create a datastore

    1. This tool only works with production HDX and requires a csv

    2. The csv can be created from an Excel input file for example

    3. This will mean fiddling with the spreadsheet eg. merging data in multiple sheets into one sheet

    4. You must “Define Schema” and make all fields text except “value” which must be a float

    5. The “Resource ID” can be found from previewing the correct resource (eg. the csv) and taking the text after “...resource/” in the url eg. “ca6a0891-8395-4d58-9168-6c44e17e0193”


Google Spreadsheet

For most of the topline figures, rather than a csv, a Google drive sheet with the same format stored in “Secured Files” on the HDX Google drive is used instead eg. for Fiji’s data: https://docs.google.com/spreadsheets/d/1ObwjZNS8y_mdjNXjhPLSqi-YGtQxkCDIW3Vg2kpwjgs


To use a Google spreadsheet:

  1. Exported it as a csv by using File -> Publish to the web

  2. Choose  “Entire Document” and “Comma-separated values (.csv)” from the dropdowns

  3. Click Publish

  4. Copy the url which will be of the form: https://docs.google.com/spreadsheets/d/1ExKJOsFlZgVH-jvPc9DVwT3Gcusq7pLaxp7i2xnePDk/pub?output=csv

  5. This url can then be uploaded as a resource to the dataset as a csv (not Google spreadsheet) using import from url rather than from file

Topline Format

The spreadsheet (Google or csv) must have either 4 or 8 rows. The order in which the data is displayed is according to the row number.


The columns in the sheet are:



 

Column name

Description

code

A unique identifier of the datum

title

A descriptive name for the datum

value

The value of the datum

latest_date

The date when the datum was updated

source

Some words stating where the datum came from

source_link

A utl to the data that can be reached by the “Data” link of the topline

notes

A description of the datum that appears when you hover over a number

explore

A link to a visualisation but generally not used

units

The units of the datum for display purposes eg. count, million, million_count

 


Things to Look Out For


  1. The code is not a generated unique identifier. It is just formed in some kind of logical way that is unlikely to conflict with other sheets eg. taking the acronym of the sheet’s title and adding a row number.

  2. The value must just be an unformatted number ie. no separators

  3. The units affects the display of value, for example a value of 3500000 with units of million will show as 35m

  4. If there are insufficient rows (ie. not 4 or 8), one or more dummy rows must be put in eg. copy an existing row


Scrapers

The topline scrapers in ScraperWiki check for topline number updates and update the data store ie. the organisation updates their csv (perhaps annually) and the topline scraper updates the datastore (which updates the display).


Currently for each new topline, there is a new sheet and a new scraper must be written.


In future, we can consider having a single Google spreadsheet with many sets of topline numbers (although how it would be updated by multiple organisations is not clear).


It should also be investigated if we can dispense with scraping and have HDX read sheets directly rather than going via the datastore.


  • No labels