🚀 Startup Stack Roadmap

Build a scalable, cost-effective data stack using modern open-source tools and serverless architecture.

✓ Expert-Designed Learning Path• Industry-Validated Curriculum• Real-World Application Focus

This roadmap was created by data engineering professionals with 31 hands-on tasks covering production-ready skills used by companies like Netflix, Airbnb, and Spotify. Master DuckDB, Polars, Metabase and 3 more technologies.

Beginner to Intermediate

8 sections • 31 tasks

Skills You'll Learn

SQL
Data modeling
Python
ETL/ELT
Serverless
Cloud

Tools You'll Use

DuckDB
Polars
Metabase
AWS Lambda/GCP Cloud Functions
GitHub Actions/AWS EventBridge
GitHub

Projects to Build

Local Data Engineering Environment with dlt, DuckDB & Jupyter
Set up a local development environment for data processing and analytics using Jupyter notebooks, dlt, and DuckDB. All tools are open-source and run locally.
Scheduled GitHub ETL with Polars, DLT & DuckDB
Build a scheduled ETL pipeline that extracts GitHub repository data, transforms it with Polars, and stores results in DuckDB
End-to-End Analytics Platform with DuckDB + Metabase
Build a modern, low-cost analytics stack using DuckDB, Metabase, and GitHub Actions for automated data updates and business-ready dashboards.

Learning Resources

Jupyter Notebooks Guide

documentation

Step 0: Pre-requisites and fundamentals

-Learn the fundamentals of data engineering

-Master Python basics and SQL

-Understand cloud computing concepts

Step 1: Local Development Environment

-Set up Python virtual environment

-Install Jupyter Notebooks

-Configure DuckDB and Polars

-Create your first data processing notebook

Step 2: Data Processing with Polars

-Learn Polars DataFrame operations

-Practice data transformations in notebooks

-Implement data quality checks

-Optimize performance with Polars

Step 3: Analytics with DuckDB

-Learn DuckDB SQL syntax

-Query public datasets

-Create analytical views

-Optimize query performance

Step 4: Version control and CI/CD

-Learn Git basics

-Create a GitHub repository for your project

-Set up GitHub Actions for data pipeline orchestration

-Implement CI/CD for data quality checks

Step 5: Serverless data processing

-Set up AWS Lambda or GCP Cloud Functions

-Create serverless data processing functions

-Implement error handling and retries

-Set up monitoring and logging

Step 6: Data visualization with Metabase

-Install and configure Metabase

-Connect Metabase to DuckDB

-Create dashboards and visualizations

-Set up automated reporting

Step 7: Production orchestration

-Set up AWS EventBridge or GCP Cloud Scheduler

-Create orchestration workflows

-Implement monitoring and alerting

-Set up data pipeline observability

Sign up for free courses and get early access to AI-powered grading, quizzes, and curated learning resources for each roadmap step.