Datavis Process (draft)
Outlined below is the process we seek to follow:
Scoping Phase
Initial Call with the Client (Data Contributor)
At this time we seek to understand the client's expectations, what data they have and what type of visualization they believe they require. Note we don't go into too much detail on what can actually be produced until we see the proposed data.
During/following the call, a DataVis Brief is produced along with a record and link that are kept on the datavis process control sheet (Mike suggestion: replace sheet with JIRA) - this document should be kept up to date during the lifecycle of the project.
Get the Data
We request contributors create a dataset for the data they wish to visualize in HDX. To encourage them to do so, we emphasize to them that dynamic data visualizations are a key value add of HDX and that either with their direct or indirect support, we will seek to load their data into a data visualization so that it dynamically updates. We tell them that they can then reuse this datavis on other digital properties ideally for operational purposes and that the onus on them is to keep updating the data so the visual remains of value.
Review the Data
Once we have the data on HDX shared publicly or privately – we can then assess the state of the data. Someone from our Data Team reviews the data determining what is possible in the visualization and how much data cleaning is required. We rate datasets according to quality. If the datasets are flat, or of high quality and have all the required metadata, we can look on this source as curated data. Unfortunately many of the datasets that arrive on HDX do not initially meet this threshold so require processing by someone from our Data Team.
Discuss what is possible
Once the data is in a state where it can be used in visualization, for example overlaid on a map, graphed or made comparable with another dataset, datavis possibilities will become more apparent. At this point, we should have a call with the data contributor to discuss what is realistic given the quality and content of the data. Ideally we would have our experts in data visualization, data science and data management on the call together. We combine our ideas and those from the data contributor to develop an MVP (minimal viable product) focusing on the core information to be conveyed. At this stage, we sometimes observe that providers have very detailed ideas on how they want their datavis to look and we may need to reduce their expectations depending on the type of data, what works from a dynamic datavis perspective and the scope of the project.
Prototyping Phase
Following our call examining and settling on what is possible, we agree that our datavis expert create and present a mockup of the proposed datavis. We use this approach to ensure we broadly share an understanding of what will be developed before coding begins. This is a crucial phase as in spite of earlier efforts, clients can sometimes have significantly different expectations. The mockup makes clear what the final datavis will look like. Note if we are able to re-use an existing datavis, we should be able to generate the mockup quite easily. (See examples like 3W.)
Show and Tell - Mockups
We have a call in which we present the first mockup and collect feedback. We document any proposed changes (as following this, coding will start). We seek to reinforce the need for the client to keep their data up to date or even better to ensure their systems automatically update the source from which the datavis is derived. (Link to IOM Missing Migrants data source as a case in point).
Show and Tell – Version 1 DataVis
In this call, we demonstrate version 1 of the datavis. Our hope is that the client will request only small changes in the datavis (like labelling or positioning) given the process that has been followed. We again document these changes and confirm them to the contributor.
Show and Tell – Version 2 DataVis
This should be the final show and tell. The datavis should now meet the client's expectations and only very minor changes should be expected. We should be able to agree with the provider what is the release plan including any tweets or blog posts with associated imagery (eg. gif).
Ongoing Prioritization
Note that we also hold a weekly internal datavis call to decide the order of priority of visualizations, allocating resources to each and also understanding and highlighting any blockers.
Weaknesses in our current process
- We don't have a system to manage this process. In addition the process is slightly ad-hoc as we still encounter edge cases which sometimes are not apparent at the beginning of the process.
- It is difficult to find a mechanism which makes it clear to the client the need for them to take responsibility for updating the data. We seek to reinforce this message but sometimes clients only want a datavis for a very specific occasion, event or funding objective or they don't see the deeper benefit of plugging their operational data into these systems or they have limited resources, capacity or skills to execute the necessary process / change management.
- We don't currently have a process in place to review whether organizations with datavis are maintaining their data. When we release the new set of features that focus more heavily on “data freshness”, we can address this concern and make improvements to this process.