📊 Project: End-to-End Analytics Platform with DuckDB + Metabase

📌 Project Overview

In this project, you'll build a complete analytics solution using DuckDB as your analytical database and Metabase as your business intelligence (BI) layer. You'll ingest and model data in DuckDB, automate updates using GitHub Actions, and create interactive dashboards in Metabase for business users.

This project simulates how a small or early-stage company could deploy a modern, low-cost analytics stack that doesn't rely on cloud infrastructure but still supports business visibility, transparency, and repeatable processes.

🎯 Learning Objectives

Install and configure Metabase for BI exploration
Build and manage analytical models using DuckDB
Create reusable and interactive dashboards for stakeholders
Automate data refreshes and workflows using GitHub Actions
Monitor data quality and ensure system reliability

🧰 Prerequisites

Python 3.8+ and pip installed
Git and GitHub account
Docker (for Metabase deployment)
Basic SQL knowledge
Optional: some familiarity with CI tools or GitHub Actions

📊 Dataset Recommendation

🔹 Dataset: NYC 311 Service Requests

Why this dataset?
- Open, real-world dataset with location, timestamps, categories
- Great for creating operational and service-based dashboards
- It's large enough to demonstrate performance with DuckDB, but manageable for local development

📥 How to Use It

Download a CSV sample (e.g. last 3 months or 100k rows)
Save to data/raw/nyc_311.csv
Ingest into DuckDB via a Python or SQL script
Create an initial staging table for modeling

⌛ Estimated Time

Duration: 6–10 hours
Difficulty: Intermediate

📂 Suggested Project Structure

duckdb-metabase-analytics/
├── data/
│   └── raw/
│       └── nyc_311.csv
├── sql/
│   ├── staging.sql
│   ├── models/
│   │   ├── dim_complaint_type.sql
│   │   └── fct_requests.sql
├── dashboard/
│   └── screenshots/
├── workflows/
│   └── update_data.yml
├── metabase/
│   └── metadata.json
├── README.md
└── .gitignore

🔄 Step-by-Step Guide

1. 🧰 Set Up Metabase

Install Metabase using Docker:

docker run -d -p 3000:3000 metabase/metabase

Connect Metabase to your local DuckDB file (analytics.duckdb)
Set up basic users and permissions
(Optional) Configure email for dashboard subscriptions

2. 🧱 Create Analytical Models in DuckDB

Load the raw NYC 311 dataset using DuckDB SQL or Python
Create staging tables:
- Clean column names
- Standardize timestamp formats
- Filter out bad data (e.g., missing locations or types)
Create views or tables for:
- Complaint counts by day, borough, complaint type
- Time to close (service level)
- Top complaint types over time

Example view:

CREATE VIEW fct_requests AS
SELECT
  complaint_type,
  borough,
  created_date,
  closed_date,
  julianday(closed_date) - julianday(created_date) AS resolution_days
FROM staging_311
WHERE created_date IS NOT NULL AND closed_date IS NOT NULL;

Document model logic in the SQL files or README

3. 📊 Build Dashboards in Metabase

Connect your analytical views to Metabase
Build key visualizations:
1. Complaint Volume Over Time
  - Line chart grouped by week/month
2. Top Complaint Types by Borough
  - Stacked bar chart with filter
3. Average Resolution Time
  - KPI or trend line
4. Complaint Heatmap by Time of Day
  - Heatmap or table pivot
Add dashboard filters (date range, borough)
Configure dashboard subscriptions or exports

4. 🔁 Automate Data Updates

Create a Python or SQL script that re-downloads and updates the DuckDB database
Use GitHub Actions to automate the update weekly or daily

Example GitHub Actions snippet:

schedule:
  - cron: "0 4 * * 0" # Every Sunday at 4am
jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install dependencies
        run: pip install duckdb pandas requests
      - name: Run update script
        run: python scripts/update_duckdb.py

Add logs and basic error handling to scripts

5. 🚀 Production Deployment

Add a Makefile or script to rehydrate the database locally
Use .env to parameterize file paths or credentials
Back up the DuckDB file on update (e.g., via cp)
Write a short system architecture diagram and store in /docs
Create a user guide with:
- Dashboard use cases
- Filter definitions
- Interpretation notes

✅ Deliverables

DuckDB SQL models (views or materialized tables)
GitHub Actions workflow for automation
Metabase dashboard with 3–5 business-ready visualizations
README with:
- Dataset description
- Setup instructions
- Metrics logic
- Screenshots of dashboards

🚀 Optional Extensions

Add dbt-duckdb for model orchestration
Enable API refresh triggers (Metabase API or Airflow/Dagster)
Add Great Expectations for data quality validation
Deploy Metabase to a lightweight cloud instance

End-to-End Analytics Platform with DuckDB + Metabase