Analytics Engineering Workflow with dbt + Metabase

    Build a production-grade analytics workflow: model, test, and document data with dbt, then visualize insights in Metabase.

    โœ“ Expert-Designed Projectโ€ข Industry-Validated Implementationโ€ข Production-Ready Architecture

    This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master dbt, BigQuery, SQL and 1 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.

    Intermediate
    6-10 hours

    ๐Ÿ“Š Project: Analytics Engineering Workflow with dbt + Metabase

    ๐Ÿ“Œ Project Overview

    In this project, you'll build an analytics engineering workflow using dbt Core to transform raw data into a clean, documented, and tested semantic layer. You'll model raw inputs into staging and dimensional layers following best practices, and then visualize key insights in Metabase.

    This hands-on project simulates what analytics engineers do in production: building reliable, testable data models and enabling business users to explore and answer their own questions through a BI layer.


    ๐ŸŽฏ Learning Objectives

    • Understand analytics engineering roles and responsibilities
    • Learn core concepts of dbt (models, sources, tests, macros, docs)
    • Apply dimensional modeling techniques (facts and dimensions)
    • Implement dbt testing and documentation workflows
    • Visualize model outputs in Metabase and share with stakeholders

    ๐Ÿงฐ Prerequisites

    • A Google Cloud account with BigQuery enabled (free tier is enough)
    • Python 3.8+ installed
    • Basic knowledge of SQL and YAML
    • Familiarity with Git and version control
    • Metabase running locally or via Docker

    โŒ› Estimated Time

    Duration: 6โ€“10 hours
    Difficulty: Intermediate


    ๐Ÿ“Š Dataset Recommendation

    ๐Ÿ”น Dataset: jaffle_shop sample data

    • Description: A simple e-commerce dataset with customers, orders, and payments tables.

    ๐Ÿ“ฅ How to Use It

    • Clone the repo:
      git clone https://github.com/dbt-labs/jaffle_shop
      cd jaffle_shop
      
    • Load the CSVs into BigQuery (via UI or bq load) or use dbt seed to upload them directly if running locally.
    • Define them as sources in sources.yml.

    ๐Ÿ“ฆ Suggested Project Structure

    analytics-workflow/
    โ”œโ”€โ”€ dbt_project/
    โ”‚   โ”œโ”€โ”€ dbt_project.yml
    โ”‚   โ”œโ”€โ”€ models/
    โ”‚   โ”‚   โ”œโ”€โ”€ staging/
    โ”‚   โ”‚   โ”œโ”€โ”€ marts/
    โ”‚   โ”‚   โ””โ”€โ”€ macros/
    โ”‚   โ””โ”€โ”€ seeds/
    โ”œโ”€โ”€ profiles.yml (user/.dbt/)
    โ”œโ”€โ”€ metabase/
    โ”‚   โ””โ”€โ”€ dashboard_screenshot.png
    โ”œโ”€โ”€ README.md
    โ””โ”€โ”€ .gitignore
    

    ๐Ÿ”„ Step-by-Step Guide

    1. ๐Ÿงฑ Set Up dbt

    • Install dbt Core:
      pip install dbt-bigquery
      
    • Configure your dbt profile:
      • Location: ~/.dbt/profiles.yml
      • Provide BigQuery project ID, dataset, location, and authentication
    • Initialize a new dbt project:
      dbt init analytics_project
      cd analytics_project
      
    • Connect to BigQuery and test the connection:
      dbt debug
      

    2. ๐Ÿ“š Create Source Definitions

    • Define your raw tables in models/src/ using sources: in a .yml file
    • Add column descriptions, unique or not_null tests
    • Set freshness checks with loaded_at_field and freshness block
    • Run initial dbt commands:
      dbt run
      dbt test
      dbt docs generate && dbt docs serve
      

    3. ๐Ÿงน Data Cleaning & Staging Models

    Cleaning Tasks by Table:

    Table Cleaning Actions
    customers Standardize names, fix null emails, add is_new_customer flag
    orders Convert date fields, compute order_age, normalize status
    payments Ensure numeric types, join with orders for completeness checks
    • Create 1:1 staging models in models/staging/
    • Rename columns from camelCase to snake_case
    • Cast all numeric and timestamp fields appropriately
    • Add not_null, unique, and data quality tests

    4. ๐Ÿง  Develop Core Models (Marts)

    Suggested Dimensional Models:

    • dim_customers

      • First order date
      • Total orders
      • Lifetime value
    • dim_dates (optional)

      • Full calendar with weekday, holiday, etc.

    Suggested Fact Models:

    • fct_orders

      • Total order value
      • Average basket size
      • Order count per day
    • fct_payments

      • Total paid/unpaid amounts
      • % payments by method
    • Use star schema principles and ref() for dependencies

    • Add referential integrity tests (relationships)

    • Build custom macros if needed (e.g., for currency formatting)


    5. ๐Ÿ“ˆ Visualization in Metabase

    • Install Metabase using Docker:
      docker run -d -p 3000:3000 metabase/metabase
      
    • Connect to your BigQuery dataset with marts tables
    • Sync schema and begin exploring data

    Suggested Visualizations:

    1. Daily Revenue Trend
      Line chart: order_date vs total_revenue

    2. Top Customers by LTV
      Bar chart: from dim_customers

    3. Payment Method Breakdown
      Pie chart: % split by method

    4. Customer Growth Over Time
      Area chart: cumulative count of customers

    5. Order Heatmap
      Heatmap: by hour of day and weekday

    • Set up scheduled refresh in Metabase or schedule dbt run and dbt build using cron/GitHub Actions

    โœ… Deliverables

    • Complete dbt project with:
      • Staging and dimensional models
      • Tests and documentation
    • Metabase dashboard with:
      • 3โ€“5 charts showing core business metrics
    • Screenshots of:
      • Metabase dashboards
      • dbt docs lineage graph
    • A README.md explaining:
      • Dataset
      • Architecture
      • Setup instructions
      • Key metrics

    ๐Ÿš€ Optional Extensions

    • Add exposures.yml to document dashboards as dbt exposures
    • Use dbt seed to include lookup tables
    • Create a GitHub repo and integrate Git-based workflow
    • Add CI/CD: GitHub Actions to run dbt build + test + docs on push

    Project Details

    Tools & Technologies

    dbt
    BigQuery
    SQL
    Metabase

    Difficulty Level

    Intermediate

    Estimated Duration

    6-10 hours

    Sign in to submit projects and track your progress

    More Projects