Mapping AWS, Google Cloud, Azure Services to Big Data Warehouse Architecture – Sonra

Below is a representation of the big data warehouse architecture. I won’t go into the details of the features and components. If you want to find out more about the gory details I recommend my excellent training course Big Data for Data Warehouse and BI Professionals.

Teach me Big Data to Advance my Career

So how do the components of the data warehouse map to the various services and products that are offered by the three most popular cloud platforms: Microsoft Azure, Google Cloud Platform, and Amazon AWS? A new product or service is almost launched each week. It can get quite daunting to keep track of what is going on. The table below makes it easy to map the various cloud services against the big data warehouse architecture. I have also included a column that lists some open source components to make it easier to compare. Please note that this list is by no means exhaustive, as there are literally hundreds of open source tools that do similar things. I have just listed those that I have had some exposure to.

Update: Due to popular requests I have added Oracle’s cloud prducts to the mix. As a reference point and due to popular demand I have also added Hortonworks and MapR to the matrix. For Oracle I am only covering what is on offer in the Oracle cloud. Oracle has various on-premise solutions such as Oracle Stream Analytics, Oracle Enterprise Metadata Management (Catalog and Lineage), Oracle EDQ for data quality etc. that are not (yet?) offered in the cloud. Oracle has some unique products that none of the other vendors can offer. I am talking about Oracle Golden Gate and to some extent also Oracle Data Integrator. There is also Big Data Discovery.

Download the full matrix that maps Oracle, Hortonworks, MapR, AWS, Azure, Google Cloud, Open Source to the Big Data Architecture (e-mail required).

Please specify a valid email

Open Source Amazon AWS Microsoft Azure Google Cloud
Batch Ingest

Sqoop
File Transfer
Flume
StreamSets

AWS Data Transfer Services (various options)

Import/Export Service
Data Factory

Cloud DataFlow

Streaming Ingest

Flume
StreamSets

Amazon Kinesis Firehose

Event Hubs
IOT Hub

Cloud DataFlow

Transient Storage

Kafka

Kinesis

Event Hubs
IOT Hub
HDInsight (Kafka)

Cloud Pub/Sub
Cloud IoT Core

Batch Processing

Hive
Flink, Spark
MapReduce
PostgreSQL

EMR Spark
EMR Hadoop
EMR Presto
AWS Batch
Redshift

Azure Batch
HDInisght (Spark/Map Reduce)
SQL Data Warehouse
Data Lake Analytics
Azure Functions

Cloud Dataflow (open source Apache Beam)
Cloud DataProc (Spark, Hadoop)

Stream Processing

Amazon Kinesis Streams
Amazon Kinesis Analytics
EMR Spark

Stream Analytics
HDInsight (Storm, Spark)

Cloud Dataflow (open source Apache Beam)
DataProc (Spark, Hadoop)

Machine Learning

Scikit
Tensorflow
Spark MLLib
TensorFlow etc.
Huge number of libraries

Lex
Polly
Recognition
Amazon Machine Learning

Azure ML
Cognitive Services

Natural Language
SpeechTranslation
Vision
Video
ML Engine

Serving Storage BI/EDW

Impala + Kudu

Redshift
Athena

SQL Data Warehouse
Analysis Services (OLAP Cubes)

BigQuery

Serving Storage Search (keywords + facets)

Solr

Amazon CloudSearch
Amazon Elasticsearch

Azure Search

N/A Marketplace, e.g. Solr

Get the latest insights delivered right to your Inbox

Arrange a Conversation 

Browse

Article by channel:

Read more articles tagged: Cloud