🧱 Modern Data Stack Data Engineer Roadmap
Master the core tools used in modern data teams — from containerization to dbt, BigQuery, and Kafka. Build real projects and get job-ready.
This roadmap was created by data engineering professionals with 34 hands-on tasks covering production-ready skills used by companies like Netflix, Airbnb, and Spotify. Master Docker, Terraform, Airflow and 7 more technologies.
Skills You'll Learn
- Cloud infrastructure
- SQL & analytics engineering
- ETL & orchestration
- Batch & stream processing
- Data modeling
Tools You'll Use
- Docker
- Terraform
- Airflow
- dlt
- BigQuery
- dbt
- Metabase
- Spark
- Kafka
- GitHub
Projects to Build
- Infrastructure-as-Code Setup on GCP
Provision a GCP environment using Terraform with BigQuery & Cloud Storage, staying within free tier limits
- ETL Pipeline Orchestration with Apache Airflow
Design and implement an orchestrated ETL pipeline using Apache Airflow to extract, transform, and load weather data from a public API into a data warehouse.
- Analytics Engineering Workflow with dbt + Metabase
Build a production-grade analytics workflow: model, test, and document data with dbt, then visualize insights in Metabase.
- GitHub Events Analytics with PySpark
Build a production-style batch data pipeline using Apache Spark to process GitHub event logs
- ⚡ Real-Time Data Streaming with Apache Kafka
Build a real-time data pipeline using Kafka (Confluent Cloud), JSON, Python, and Polars. Simulate NYC Taxi data, process in real time, and visualize with Metabase.