End-to-End Analytics Platform with DuckDB + Metabase

    Build a modern, low-cost analytics stack using DuckDB, Metabase, and GitHub Actions for automated data updates and business-ready dashboards.

    ✓ Expert-Designed Project• Industry-Validated Implementation• Production-Ready Architecture

    This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master DuckDB, Metabase, Python and 2 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.

    Intermediate
    6-10 hours

    📊 Project: End-to-End Analytics Platform with DuckDB + Metabase

    📌 Project Overview

    In this project, you'll build a complete analytics solution using DuckDB as your analytical database and Metabase as your business intelligence (BI) layer. You'll ingest and model data in DuckDB, automate updates using GitHub Actions, and create interactive dashboards in Metabase for business users.

    This project simulates how a small or early-stage company could deploy a modern, low-cost analytics stack that doesn't rely on cloud infrastructure but still supports business visibility, transparency, and repeatable processes.


    🎯 Learning Objectives

    • Install and configure Metabase for BI exploration
    • Build and manage analytical models using DuckDB
    • Create reusable and interactive dashboards for stakeholders
    • Automate data refreshes and workflows using GitHub Actions
    • Monitor data quality and ensure system reliability

    🧰 Prerequisites

    • Python 3.8+ and pip installed
    • Git and GitHub account
    • Docker (for Metabase deployment)
    • Basic SQL knowledge
    • Optional: some familiarity with CI tools or GitHub Actions

    📊 Dataset Recommendation

    🔹 Dataset: NYC 311 Service Requests

    • Why this dataset?
      • Open, real-world dataset with location, timestamps, categories
      • Great for creating operational and service-based dashboards
      • It's large enough to demonstrate performance with DuckDB, but manageable for local development

    📥 How to Use It

    • Download a CSV sample (e.g. last 3 months or 100k rows)
    • Save to data/raw/nyc_311.csv
    • Ingest into DuckDB via a Python or SQL script
    • Create an initial staging table for modeling

    ⌛ Estimated Time

    Duration: 6–10 hours
    Difficulty: Intermediate


    📂 Suggested Project Structure

    duckdb-metabase-analytics/
    ├── data/
    │   └── raw/
    │       └── nyc_311.csv
    ├── sql/
    │   ├── staging.sql
    │   ├── models/
    │   │   ├── dim_complaint_type.sql
    │   │   └── fct_requests.sql
    ├── dashboard/
    │   └── screenshots/
    ├── workflows/
    │   └── update_data.yml
    ├── metabase/
    │   └── metadata.json
    ├── README.md
    └── .gitignore
    

    🔄 Step-by-Step Guide

    1. 🧰 Set Up Metabase

    • Install Metabase using Docker:
      docker run -d -p 3000:3000 metabase/metabase
      
    • Connect Metabase to your local DuckDB file (analytics.duckdb)
    • Set up basic users and permissions
    • (Optional) Configure email for dashboard subscriptions

    2. 🧱 Create Analytical Models in DuckDB

    • Load the raw NYC 311 dataset using DuckDB SQL or Python
    • Create staging tables:
      • Clean column names
      • Standardize timestamp formats
      • Filter out bad data (e.g., missing locations or types)
    • Create views or tables for:
      • Complaint counts by day, borough, complaint type
      • Time to close (service level)
      • Top complaint types over time

    Example view:

    CREATE VIEW fct_requests AS
    SELECT
      complaint_type,
      borough,
      created_date,
      closed_date,
      julianday(closed_date) - julianday(created_date) AS resolution_days
    FROM staging_311
    WHERE created_date IS NOT NULL AND closed_date IS NOT NULL;
    
    • Document model logic in the SQL files or README

    3. 📊 Build Dashboards in Metabase

    • Connect your analytical views to Metabase
    • Build key visualizations:
      1. Complaint Volume Over Time
        • Line chart grouped by week/month
      2. Top Complaint Types by Borough
        • Stacked bar chart with filter
      3. Average Resolution Time
        • KPI or trend line
      4. Complaint Heatmap by Time of Day
        • Heatmap or table pivot
    • Add dashboard filters (date range, borough)
    • Configure dashboard subscriptions or exports

    4. 🔁 Automate Data Updates

    • Create a Python or SQL script that re-downloads and updates the DuckDB database
    • Use GitHub Actions to automate the update weekly or daily

    Example GitHub Actions snippet:

    schedule:
      - cron: "0 4 * * 0" # Every Sunday at 4am
    jobs:
      update:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v3
          - name: Install dependencies
            run: pip install duckdb pandas requests
          - name: Run update script
            run: python scripts/update_duckdb.py
    
    • Add logs and basic error handling to scripts

    5. 🚀 Production Deployment

    • Add a Makefile or script to rehydrate the database locally
    • Use .env to parameterize file paths or credentials
    • Back up the DuckDB file on update (e.g., via cp)
    • Write a short system architecture diagram and store in /docs
    • Create a user guide with:
      • Dashboard use cases
      • Filter definitions
      • Interpretation notes

    ✅ Deliverables

    • DuckDB SQL models (views or materialized tables)
    • GitHub Actions workflow for automation
    • Metabase dashboard with 3–5 business-ready visualizations
    • README with:
      • Dataset description
      • Setup instructions
      • Metrics logic
      • Screenshots of dashboards

    🚀 Optional Extensions

    • Add dbt-duckdb for model orchestration
    • Enable API refresh triggers (Metabase API or Airflow/Dagster)
    • Add Great Expectations for data quality validation
    • Deploy Metabase to a lightweight cloud instance

    Project Details

    Tools & Technologies

    DuckDB
    Metabase
    Python
    GitHub Actions
    Docker

    Difficulty Level

    Intermediate

    Estimated Duration

    6-10 hours

    Sign in to submit projects and track your progress

    More Projects