End-to-End Analytics Platform with DuckDB + Metabase
Build a modern, low-cost analytics stack using DuckDB, Metabase, and GitHub Actions for automated data updates and business-ready dashboards.
This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master DuckDB, Metabase, Python and 2 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.
📊 Project: End-to-End Analytics Platform with DuckDB + Metabase
📌 Project Overview
In this project, you'll build a complete analytics solution using DuckDB as your analytical database and Metabase as your business intelligence (BI) layer. You'll ingest and model data in DuckDB, automate updates using GitHub Actions, and create interactive dashboards in Metabase for business users.
This project simulates how a small or early-stage company could deploy a modern, low-cost analytics stack that doesn't rely on cloud infrastructure but still supports business visibility, transparency, and repeatable processes.
🎯 Learning Objectives
- Install and configure Metabase for BI exploration
- Build and manage analytical models using DuckDB
- Create reusable and interactive dashboards for stakeholders
- Automate data refreshes and workflows using GitHub Actions
- Monitor data quality and ensure system reliability
🧰 Prerequisites
- Python 3.8+ and
pipinstalled - Git and GitHub account
- Docker (for Metabase deployment)
- Basic SQL knowledge
- Optional: some familiarity with CI tools or GitHub Actions
📊 Dataset Recommendation
🔹 Dataset: NYC 311 Service Requests
- Why this dataset?
- Open, real-world dataset with location, timestamps, categories
- Great for creating operational and service-based dashboards
- It's large enough to demonstrate performance with DuckDB, but manageable for local development
📥 How to Use It
- Download a CSV sample (e.g. last 3 months or 100k rows)
- Save to
data/raw/nyc_311.csv - Ingest into DuckDB via a Python or SQL script
- Create an initial staging table for modeling
⌛ Estimated Time
Duration: 6–10 hours
Difficulty: Intermediate
📂 Suggested Project Structure
duckdb-metabase-analytics/
├── data/
│ └── raw/
│ └── nyc_311.csv
├── sql/
│ ├── staging.sql
│ ├── models/
│ │ ├── dim_complaint_type.sql
│ │ └── fct_requests.sql
├── dashboard/
│ └── screenshots/
├── workflows/
│ └── update_data.yml
├── metabase/
│ └── metadata.json
├── README.md
└── .gitignore
🔄 Step-by-Step Guide
1. 🧰 Set Up Metabase
- Install Metabase using Docker:
docker run -d -p 3000:3000 metabase/metabase - Connect Metabase to your local DuckDB file (
analytics.duckdb) - Set up basic users and permissions
- (Optional) Configure email for dashboard subscriptions
2. 🧱 Create Analytical Models in DuckDB
- Load the raw NYC 311 dataset using DuckDB SQL or Python
- Create staging tables:
- Clean column names
- Standardize timestamp formats
- Filter out bad data (e.g., missing locations or types)
- Create views or tables for:
- Complaint counts by day, borough, complaint type
- Time to close (service level)
- Top complaint types over time
Example view:
CREATE VIEW fct_requests AS
SELECT
complaint_type,
borough,
created_date,
closed_date,
julianday(closed_date) - julianday(created_date) AS resolution_days
FROM staging_311
WHERE created_date IS NOT NULL AND closed_date IS NOT NULL;
- Document model logic in the SQL files or README
3. 📊 Build Dashboards in Metabase
- Connect your analytical views to Metabase
- Build key visualizations:
- Complaint Volume Over Time
- Line chart grouped by week/month
- Top Complaint Types by Borough
- Stacked bar chart with filter
- Average Resolution Time
- KPI or trend line
- Complaint Heatmap by Time of Day
- Heatmap or table pivot
- Complaint Volume Over Time
- Add dashboard filters (date range, borough)
- Configure dashboard subscriptions or exports
4. 🔁 Automate Data Updates
- Create a
PythonorSQLscript that re-downloads and updates the DuckDB database - Use GitHub Actions to automate the update weekly or daily
Example GitHub Actions snippet:
schedule:
- cron: "0 4 * * 0" # Every Sunday at 4am
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: pip install duckdb pandas requests
- name: Run update script
run: python scripts/update_duckdb.py
- Add logs and basic error handling to scripts
5. 🚀 Production Deployment
- Add a
Makefileor script to rehydrate the database locally - Use
.envto parameterize file paths or credentials - Back up the DuckDB file on update (e.g., via
cp) - Write a short system architecture diagram and store in
/docs - Create a user guide with:
- Dashboard use cases
- Filter definitions
- Interpretation notes
✅ Deliverables
- DuckDB SQL models (views or materialized tables)
- GitHub Actions workflow for automation
- Metabase dashboard with 3–5 business-ready visualizations
- README with:
- Dataset description
- Setup instructions
- Metrics logic
- Screenshots of dashboards
🚀 Optional Extensions
- Add dbt-duckdb for model orchestration
- Enable API refresh triggers (Metabase API or Airflow/Dagster)
- Add
Great Expectationsfor data quality validation - Deploy Metabase to a lightweight cloud instance