*** Work in Progress ***

Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security and enterprise applications. Of the 90+ services, the most popular include Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). Most services are not exposed directly to end users, but instead offer functionality through APIs for developers to use in their applications. Amazon Web Services’ offerings are accessed over HTTP, using the REST architectural style and SOAP protocol.

What relevant services are there?

In the list below taken mainly from Wikipedia, I use green to indicate where we might use the service to create new or enhanced functionality. I use orange to indicate we might replace something we are already using or add to a list of products we are evaluating. I use blue to indicate a speculative future usage. I use strikethrough to indicate that the service is as far as I can tell not relevant to us (or we would only use it indirectly). 

Compute

Networking

(The below could be used to replace BlackMesh)

Content delivery

Contact Center

Storage and content delivery

Database

Mobile services

Deployment

Management

Application services

Analytics

Miscellaneous


I could imagine datasets being uploaded into AWS rather than the datastore and having far more functionality and significantly faster querying of that data than we currently have for the datastore. The data uploaded into AWS could be harmonised and transformed on the fly for visualisation much like the HXL Proxy does for HXLated data eg. for future map explorer. The HXL Proxy could be refactored to use AWS for scalability and to use some of its harmonisation capabilities. I am not sure how feasible it is, but using AWS scraping (crawling) functionality, it might be possible on the fly to for example pull out an HTML table from a website, harmonise and transform the data, HXLate it and have it exposed through HDX as a url in a dataset's resource (Question: can Crawlers Frequency be set to allow on the fly scraping). Another example could be the steps that make up the FTS daily scraper which pulls from API, transforms, HXLates and uploads to HDX, being done on demand on the fly when the download button is clicked in a dataset through AWS.

One issue is how well these harmonisations and transformations would work for HXLated data as that can break the inferring mechanisms as with Frictionless.


One of the issues raised about Connect (from Javier's feedback) is orgs want to store the sensitive data on HDX. We had been worried about how to do the security well enough. If the data was stored on AWS, could that be a solution? From AWS website: "as an AWS customer, you will benefit from a data center and network architecture built to meet the requirements of the most security-sensitive organizations."