How to scope an AWS Data Pipeline

Investigative Questions

Structured Unstructured
Relational Non-Relational
Schema stays the same Schema changes often
Snowflake Redshift
Scale up scale down Manually add nodes
Real-time analytics

AWS Data Warehouse comparisons

Redshift Cons

Snowflake Cons

When to use Redshift When to use Snowflake
AWS Redshift is best suited when your organization is already using services from this company, and there are heavy query loads on applications that need analytics and structured information in real time. Snowflake is the best option for organizations with lighter query loads, which need frequent scaling. It’s also built on automation without operational overhead.

AWS Data Lake comparisons

Geneal comparisons

Data Lake Data Warehouse
Large volume of data in multiple formats Visualize data and extract insights
Store IoT data for real-time analysis Decision making not just collecting data for analysis
Raw unstructured data to generate output, e.g. machine learning Original data source is not suitable for querying, and you need to separate analytical data from your transactional data

Data lakes and data warehouses comparisons

Data warehouse: