🧱 Modern Data Stack Data Engineer Roadmap

    Master the core tools used in modern data teams — from containerization to dbt, BigQuery, and Kafka. Build real projects and get job-ready.

    ✓ Expert-Designed Learning Path• Industry-Validated Curriculum• Real-World Application Focus

    This roadmap was created by data engineering professionals with 34 hands-on tasks covering production-ready skills used by companies like Netflix, Airbnb, and Spotify. Master Docker, Terraform, Airflow and 7 more technologies.

    Intermediate
    9 sections • 34 tasks

    Skills You'll Learn

    • Cloud infrastructure
    • SQL & analytics engineering
    • ETL & orchestration
    • Batch & stream processing
    • Data modeling

    Tools You'll Use

    • Docker
    • Terraform
    • Airflow
    • dlt
    • BigQuery
    • dbt
    • Metabase
    • Spark
    • Kafka
    • GitHub

    Projects to Build

    Step 0: Pre-requisites and fundamentals

    -Learn the fundamentals
    -Know basic SQL and Python

    Step 1: Containerization & Infrastructure

    -Install Docker & Docker Compose
    -Run PostgreSQL using Docker locally
    -Install Terraform CLI
    -Provision GCP infra (BQ dataset + GCS bucket)

    Step 2: Workflow Orchestration with Airflow

    -Set up Airflow locally with Docker (you can also use https://www.astronomer.io/'s free tier)
    -Build a basic flow (CSV file to BigQuery)
    -Schedule a flow to run daily
    -Add logging and notification features

    Step 3: Data Ingestion & Loading (Airflow and dlt)

    -Create API ingestion task (e.g., GitHub or OpenWeather) with dlt
    -Normalize JSON into flat tables
    -Run on a schedule and incrementally with Airflow

    Step 4: Data Warehousing in BigQuery

    -Load sample data into BigQuery
    -Apply partitioning and clustering
    -Run SQL queries and optimize costs

    Step 5: Analytics Engineering with dbt

    -Install and initialize dbt with BigQuery
    -Build staging models
    -Add documentation and tests
    -Deploy with GitHub Actions or dbt Cloud
    -Visualize output in Metabase

    Step 6: Batch Processing with Spark

    -Install Spark locally or via Colab
    -Load and transform a CSV with PySpark
    -Run groupBy and joins on large datasets
    -Explore partitioning and performance tuning

    Step 7: Streaming with Kafka

    -Install Kafka via Docker or use Confluent Cloud
    -Create a simple producer/consumer
    -Process events with Kafka Streams or KSQL
    -Use Schema Registry with Avro or Protobuf

    Final Project: Build a Real Data Pipeline

    -Choose a dataset and domain (e.g., finance, sports, ecommerce)
    -Ingest the data using Airflow and dlt for batch or Kafka for streaming
    -Model and test with dbt
    -Load into BigQuery and visualize KPIs
    -Publish project on GitHub and write a short case study

    Sign up for free courses and get early access to AI-powered grading, quizzes, and curated learning resources for each roadmap step.