Data Engineering Projects

Build real-world data engineering experience with hands-on projects. From simple ETL pipelines to complex streaming architectures, master the skills employers are looking for.

Build Your Data Engineering Portfolio

Our project-based learning approach gives you practical experience with real-world data engineering challenges. Each project includes detailed instructions, starter code, and comprehensive solutions to help you learn effectively.

Project Categories:

• ETL/ELT Data Pipelines
• Real-time Stream Processing
• Data Warehouse & Lake Architecture
• Cloud-Native Data Solutions
• Microservices Data Architecture
• Analytics & Monitoring Dashboards

Technologies You'll Master:

JupyterdltDuckDBPythonPolarsDLTGitHub ActionsTerraform/OpenTofu/PulumiMetabaseDockerTerraformGCP

9 projects available across 3 difficulty levels. Perfect for building a portfolio that demonstrates your data engineering expertise to employers.

Showing 9 of 9 projects

Local Data Engineering Environment with dlt, DuckDB & Jupyter

Set up a local development environment for data processing and analytics using Jupyter notebooks, dlt, and DuckDB. All tools are open-source and run locally.

Beginner

2-4 hours

Tools & Technologies:

Jupyter

dlt

DuckDB

Python

Scheduled GitHub ETL with Polars, DLT & DuckDB

Build a scheduled ETL pipeline that extracts GitHub repository data, transforms it with Polars, and stores results in DuckDB

Intermediate

4-6 hours

Tools & Technologies:

Polars

DLT

DuckDB

GitHub Actions

+2 more

End-to-End Analytics Platform with DuckDB + Metabase

Build a modern, low-cost analytics stack using DuckDB, Metabase, and GitHub Actions for automated data updates and business-ready dashboards.

Intermediate

6-10 hours

Tools & Technologies:

DuckDB

Metabase

Python

GitHub Actions

+1 more

Infrastructure-as-Code Setup on GCP

Provision a GCP environment using Terraform with BigQuery & Cloud Storage, staying within free tier limits

Intermediate

4-6 hours

Tools & Technologies:

Terraform

GCP

BigQuery

Cloud Storage

+1 more

ETL Pipeline Orchestration with Apache Airflow

Design and implement an orchestrated ETL pipeline using Apache Airflow to extract, transform, and load weather data from a public API into a data warehouse.

Intermediate

8-12 hours

Tools & Technologies:

Airflow

Docker

Python

APIs

+2 more

Analytics Engineering Workflow with dbt + Metabase

Build a production-grade analytics workflow: model, test, and document data with dbt, then visualize insights in Metabase.

Intermediate

6-10 hours

Tools & Technologies:

dbt

BigQuery

SQL

Metabase

GitHub Events Analytics with PySpark

Build a production-style batch data pipeline using Apache Spark to process GitHub event logs

Advanced

10-12 hours

Tools & Technologies:

Apache Spark

Python

PySpark

Docker

+2 more

⚡ Real-Time Data Streaming with Apache Kafka

Build a real-time data pipeline using Kafka (Confluent Cloud), JSON, Python, and Polars. Simulate NYC Taxi data, process in real time, and visualize with Metabase.

Advanced

8-12 hours

Tools & Technologies:

Kafka

Confluent Cloud

Python

Polars

+4 more

CI/CD for Data Pipelines

Build a complete CI/CD pipeline for a data engineering project using GitHub Actions, dbt, Airflow DAG testing, and Terraform infrastructure deployment.

Intermediate

6-10 hours

Tools & Technologies:

GitHub Actions

dbt

Airflow

Terraform

+2 more

Why Choose Project-Based Learning?

Real-World Application

Work with actual datasets and scenarios that mirror production environments. Build solutions that demonstrate your ability to handle complex data challenges.

Portfolio Development

Create a compelling portfolio that showcases your technical skills to potential employers. Each project includes documentation and deployment instructions.

Industry-Relevant Skills

Focus on the tools and technologies that are in high demand in the data engineering job market. Stay current with modern data stack practices.

Data Engineering Projects

Build Your Data Engineering Portfolio

Project Categories:

Technologies You'll Master:

What You'll Learn from Data Engineering Projects

Skills Development

Local Data Engineering Environment with dlt, DuckDB & Jupyter

Scheduled GitHub ETL with Polars, DLT & DuckDB

End-to-End Analytics Platform with DuckDB + Metabase

Infrastructure-as-Code Setup on GCP

ETL Pipeline Orchestration with Apache Airflow

Analytics Engineering Workflow with dbt + Metabase

GitHub Events Analytics with PySpark

⚡ Real-Time Data Streaming with Apache Kafka

CI/CD for Data Pipelines

Why Choose Project-Based Learning?

Real-World Application

Portfolio Development

Industry-Relevant Skills