Using Amazon Web Services in HDX

*** Work in Progress ***

Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security and enterprise applications. Of the 90+ services, the most popular include Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). Most services are not exposed directly to end users, but instead offer functionality through APIs for developers to use in their applications. Amazon Web Services’ offerings are accessed over HTTP, using the REST architectural style and SOAP protocol.

What relevant services are there?

In the list below taken mainly from Wikipedia, I use green to indicate where we might use the service to create new or enhanced functionality. I use orange to indicate we might replace something we are already using or add to a list of products we are evaluating. I use blue to indicate a speculative future usage. I use strikethrough to indicate that the service is as far as I can tell not relevant to us (or we would only use it indirectly). 

Compute

Networking

(The below could be used to replace BlackMesh)

Content delivery

Contact Center

  • Amazon Connect is a self-service, cloud-based contact center service available to business. Amazon Connect is based on the same contact center technology used extensively by Amazon customer service associates around the world. (Replacing our chat solution eg. Zoho)

Storage and content delivery

  • Amazon Simple Storage Service (S3) provides scalable object storage accessible from a Web Service interface. Applicable use cases include backup/archiving, file (including media) storage and hosting, static website hosting, application data hosting, and more.
  • Amazon Glacier provides long-term storage options (compared to S3). High redundancy and availability, but low-frequency access times. Intended for archiving data.
  • AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup.
  • Amazon Elastic Block Store (EBS) provides persistent block-level storage volumes for EC2.
  • AWS Import/Export, accelerates moving large amounts of data into and out of AWS using portable storage devices for transport.
  • Amazon Elastic File System (EFS) a file storage service for Amazon Elastic Compute Cloud (Amazon EC2) instances.

Database

  • Amazon DynamoDB provides a scalable, low-latency NoSQL online Database Service backed by SSDs.
  • Amazon ElastiCache provides in-memory caching for web applications.[42] This is Amazon's implementation of Memcached and Redis.[43]
  • Amazon Relational Database Service (RDS) provides scalable database servers with MySQL, Oracle, SQL Server, and PostgreSQL support.[44]
  • Amazon Redshift provides petabyte-scale data warehousing with column-based storage and multi-node compute.
  • Amazon SimpleDB allows developers to run queries on structured data. It operates in concert with EC2 and S3.
  • AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage services (e.g., Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon EMR). In other words, this service is simply a data-driven workload management system, which provides a management API for managing and monitoring of data-driven workloads in cloud applications.[45]
  • Amazon Aurora provides a MySQL-compatible relational database engine that has been created specifically for the AWS infrastructure that claims faster speeds and lower costs that are realized in larger databases.

Mobile services

  • AWS Mobile Hub lets you easily add and configure features for your mobile apps, including user authentication, data storage, backend logic, push notifications, content delivery, and analytics.
  • Amazon Cognito lets you easily add user sign-up and sign-in to your mobile and web apps.
  • AWS Device Farm is an app testing service that lets you test and interact with your Android, iOS, and web apps on many devices at once, or reproduce issues on a device in real time..
  • Amazon Pinpoint makes it easy to engage your customers via email, SMS and Mobile Push messages, tracking overall customer and engagement activity.

Deployment

Management

Application services

  • Amazon API Gateway is a service for publishing, maintaining and securing web service APIs.
  • Amazon CloudSearch provides basic full-text search and indexing of textual content.
  • Amazon DevPay, currently in limited beta version, is a billing and account management system for applications that developers have built atop Amazon Web Services.
  • Amazon Elastic Transcoder (ETS) provides video transcoding of S3 hosted videos, marketed primarily as a way to convert source files into mobile-ready versions.
  • Amazon Simple Email Service (SES) provides bulk and transactional email sending. (replacing our bulk emailer)
  • Amazon Simple Queue Service (SQS) provides a hosted message queue for web applications.
  • Amazon Simple Notification Service (SNS) provides a hosted multi-protocol "push" messaging for applications.
  • Amazon Simple Workflow (SWF) is a workflow service for building scalable, resilient applications.
  • Amazon Cognito is a user identity and data synchronization service that securely manages and synchronizes app data for users across their mobile devices.[47]
  • Amazon AppStream 2.0 is a low-latency service that streams and resources intensive applications and games from the cloud using NICE DVC technology.[48]

Analytics

  • Amazon Athena is an ETL-like service launched in November 2016. It allows server-less querying of S3 content using standard SQL.[49]
  • Amazon Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. 
  • Amazon Elastic MapReduce (EMR) Provides a PaaS service delivering Hadoop for running MapReduce queries framework running on the web-scale infrastructure of EC2 and Amazon S3.
  • Amazon Machine Learning a service that assists developers of all skill levels to use machine learning technology.
  • Amazon Kinesis is a cloud-based service for real-time data processing over large, distributed data streams. It streams data in real time with the ability to process thousands of data streams on a per-second basis. The service, designed for real-time apps, allows developers to pull any amount of data, from any number of sources, scaling up or down as needed. It has some similarities in functionality to Apache Kafka.[50]
  • Amazon Elasticsearch Service provides fully managed Elasticsearch and Kibana services.[51]
  • Amazon QuickSight is a business intelligence, analytics, and visualization tool launched in November 2016.[52] It provides ad-hoc services by connecting to AWS or non-AWS data sources.

Miscellaneous

  • Amazon Marketplace Web Service (MWS) allows users to manage complete shipment process from creating listing to downloading shipment label using API.
  • Amazon Fulfillment Web Service provided a programmatic web service for sellers to ship items to and from Amazon using Fulfillment by Amazon, later replaced by Amazon marketplace Web service.
  • Amazon Historical Pricing provides access to Amazon's historical sales data from its affiliates. (It appears that this service has been discontinued.)
  • Amazon Mechanical Turk (Mturk) manages small units of work distributed among many persons.
  • Amazon Product Advertising API, formerly known as Amazon Associates Web Service (A2S) and Amazon E-Commerce Service (ECS), provides access to Amazon's product data and electronic commerce functionality.
  • Amazon Gift Code On Demand (AGCOD) for Corporate Customers[53] enables companies to distribute Amazon gift codes instantly in any denomination.
  • AWS Partner Network (APN) technical information and sales and marketing support. Launched in April 2012, the APN is made up of Technology Partners including Independent Software Vendors (ISVs), tool providers, platform providers, and others.[54][55][56]
  • Amazon Lumberyard is a freeware triple-A game engine integrated with AWS.[57]
  • Amazon Chime is a collaboration service for voice, video conference, and instant messaging.[58]


I could imagine datasets being uploaded into AWS rather than the datastore and having far more functionality and significantly faster querying of that data than we currently have for the datastore. The data uploaded into AWS could be harmonised and transformed on the fly for visualisation much like the HXL Proxy does for HXLated data eg. for future map explorer. The HXL Proxy could be refactored to use AWS for scalability and to use some of its harmonisation capabilities. I am not sure how feasible it is, but using AWS scraping (crawling) functionality, it might be possible on the fly to for example pull out an HTML table from a website, harmonise and transform the data, HXLate it and have it exposed through HDX as a url in a dataset's resource (Question: can Crawlers Frequency be set to allow on the fly scraping). Another example could be the steps that make up the FTS daily scraper which pulls from API, transforms, HXLates and uploads to HDX, being done on demand on the fly when the download button is clicked in a dataset through AWS.

One issue is how well these harmonisations and transformations would work for HXLated data as that can break the inferring mechanisms as with Frictionless.


One of the issues raised about Connect (from Javier's feedback) is orgs want to store the sensitive data on HDX. We had been worried about how to do the security well enough. If the data was stored on AWS, could that be a solution? From AWS website: "as an AWS customer, you will benefit from a data center and network architecture built to meet the requirements of the most security-sensitive organizations."