HDX user sandbox

Requirement: Users outside the HDX team need to be able to learn the site, experiment, and test new features without actually uploading data to the production site. Some of these users will not have HDX orgs, or even accounts.

Status quo

Currently users can learn, experiment, and test new features on https://feature-data.humdata.org , but there are three issues:

  1. They need already to have an account.
  2. They need already to belong to an organisation and have sufficient permissions to create datasets.
  3. Any work they do will be deleted the next time we copy the prod database to feature-data.

For training, testing, and experimentation, #3 will not be a problem, as long as we warn people that their work will not be permanent, and avoid updating feature-data at times when people are likely to be using it. For the other two issues, if users do not already have accounts and organisation access on HDX production, we currently have to create those accounts manually.

Proposed changes

We should have an organisation always available on feature-data called Sandbox, and a series of pre-created accounts for people to use.

First, we create a Python script (using the CKAN API or Michael Rans's HDX API) that does the following:

  1. Create a CKAN organisation called Sandbox.
  2. Create 20 user accounts, from test01 to test20, each with the same password (e.g. "hdxtest99").
  3. Add those user accounts to the Sandbox organisation, and give each of them the "editor" permission.

Next, every time we update the feature-data database from prod, we run this script immediately afterwards to recreate the org and test accounts.

Now, whenever we want to give a user the ability to experiment with HDX, we can provide these test accounts if the user does not already have an account and org on HDX prod.

Known risks and mitigations

RiskMitigation
We need to update feature-data at times when people may be testing or training.We create a different instance, sandbox-data.humdata.org, and use it instead of feature-data for user testing and training.
Two users attempt to use the same test account at the same time.For most activities, this should not be a problem (e.g. each one is uploading a different dataset). When it is an issue, we can strong encourage each person to use a different numbered account, such as test01, test02, etc. Note that people who already have accounts and orgs on HDX production will not need to use the test accounts.
People find the site and become confused that it may contain real, production data.

Every page on feature-data and test-data (and sandbox-data, if we create it) should contain a warning banner and watermark, indicating that it is not the primary site.

Use robots.txt (and maybe Google site maps) to discourage spiders from crawling the non-production sites. 

People start spamming the site using the test accounts.Change the common password periodically (e.g. every month) to reduce the risk of this happening.