🧱 Modern Data Stack Roadmap

Master the core tools used in modern data teams — from containerization to dbt, BigQuery, and Kafka. Build real projects and get job-ready.

✓ Expert-Designed Learning Path• Industry-Validated Curriculum• Real-World Application Focus

This roadmap was created by data engineering professionals with 34 hands-on tasks covering production-ready skills used by companies like Netflix, Airbnb, and Spotify. Master Docker, Terraform, Airflow and 7 more technologies.

Intermediate

9 sections • 34 tasks

Skills You'll Learn

Cloud infrastructure
SQL & analytics engineering
ETL & orchestration
Batch & stream processing
Data modeling

Tools You'll Use

Docker
Terraform
Airflow
dlt
BigQuery
dbt
Metabase
Spark
Kafka
GitHub

Projects to Build

Infrastructure-as-Code Setup on GCP
Provision a GCP environment using Terraform with BigQuery & Cloud Storage, staying within free tier limits
ETL Pipeline Orchestration with Apache Airflow
Design and implement an orchestrated ETL pipeline using Apache Airflow to extract, transform, and load weather data from a public API into a data warehouse.
Analytics Engineering Workflow with dbt + Metabase
Build a production-grade analytics workflow: model, test, and document data with dbt, then visualize insights in Metabase.
GitHub Events Analytics with PySpark
Build a production-style batch data pipeline using Apache Spark to process GitHub event logs
⚡ Real-Time Data Streaming with Apache Kafka
Build a real-time data pipeline using Kafka (Confluent Cloud), JSON, Python, and Polars. Simulate NYC Taxi data, process in real time, and visualize with Metabase.
CI/CD for Data Pipelines
Build a complete CI/CD pipeline for a data engineering project using GitHub Actions, dbt, Airflow DAG testing, and Terraform infrastructure deployment.