Analytics Engineering Workflow with dbt + Metabase
Build a production-grade analytics workflow: model, test, and document data with dbt, then visualize insights in Metabase.
This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master dbt, BigQuery, SQL and 1 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.
๐ Project: Analytics Engineering Workflow with dbt + Metabase
๐ Project Overview
In this project, you'll build an analytics engineering workflow using dbt Core to transform raw data into a clean, documented, and tested semantic layer. You'll model raw inputs into staging and dimensional layers following best practices, and then visualize key insights in Metabase.
This hands-on project simulates what analytics engineers do in production: building reliable, testable data models and enabling business users to explore and answer their own questions through a BI layer.
๐ฏ Learning Objectives
- Understand analytics engineering roles and responsibilities
- Learn core concepts of dbt (models, sources, tests, macros, docs)
- Apply dimensional modeling techniques (facts and dimensions)
- Implement dbt testing and documentation workflows
- Visualize model outputs in Metabase and share with stakeholders
๐งฐ Prerequisites
- A Google Cloud account with BigQuery enabled (free tier is enough)
- Python 3.8+ installed
- Basic knowledge of SQL and YAML
- Familiarity with Git and version control
- Metabase running locally or via Docker
โ Estimated Time
Duration: 6โ10 hours
Difficulty: Intermediate
๐ Dataset Recommendation
๐น Dataset: jaffle_shop sample data
- Description: A simple e-commerce dataset with
customers,orders, andpaymentstables.
๐ฅ How to Use It
- Clone the repo:
git clone https://github.com/dbt-labs/jaffle_shop cd jaffle_shop - Load the CSVs into BigQuery (via UI or
bq load) or usedbt seedto upload them directly if running locally. - Define them as sources in
sources.yml.
๐ฆ Suggested Project Structure
analytics-workflow/
โโโ dbt_project/
โ โโโ dbt_project.yml
โ โโโ models/
โ โ โโโ staging/
โ โ โโโ marts/
โ โ โโโ macros/
โ โโโ seeds/
โโโ profiles.yml (user/.dbt/)
โโโ metabase/
โ โโโ dashboard_screenshot.png
โโโ README.md
โโโ .gitignore
๐ Step-by-Step Guide
1. ๐งฑ Set Up dbt
- Install dbt Core:
pip install dbt-bigquery - Configure your dbt profile:
- Location:
~/.dbt/profiles.yml - Provide BigQuery project ID, dataset, location, and authentication
- Location:
- Initialize a new dbt project:
dbt init analytics_project cd analytics_project - Connect to BigQuery and test the connection:
dbt debug
2. ๐ Create Source Definitions
- Define your raw tables in
models/src/usingsources:in a.ymlfile - Add column descriptions,
uniqueornot_nulltests - Set freshness checks with
loaded_at_fieldandfreshnessblock - Run initial dbt commands:
dbt run dbt test dbt docs generate && dbt docs serve
3. ๐งน Data Cleaning & Staging Models
Cleaning Tasks by Table:
| Table | Cleaning Actions |
|---|---|
customers |
Standardize names, fix null emails, add is_new_customer flag |
orders |
Convert date fields, compute order_age, normalize status |
payments |
Ensure numeric types, join with orders for completeness checks |
- Create 1:1 staging models in
models/staging/ - Rename columns from
camelCasetosnake_case - Cast all numeric and timestamp fields appropriately
- Add
not_null,unique, and data quality tests
4. ๐ง Develop Core Models (Marts)
Suggested Dimensional Models:
dim_customers- First order date
- Total orders
- Lifetime value
dim_dates(optional)- Full calendar with weekday, holiday, etc.
Suggested Fact Models:
fct_orders- Total order value
- Average basket size
- Order count per day
fct_payments- Total paid/unpaid amounts
- % payments by method
Use star schema principles and
ref()for dependenciesAdd referential integrity tests (
relationships)Build custom macros if needed (e.g., for currency formatting)
5. ๐ Visualization in Metabase
- Install Metabase using Docker:
docker run -d -p 3000:3000 metabase/metabase - Connect to your BigQuery dataset with
martstables - Sync schema and begin exploring data
Suggested Visualizations:
Daily Revenue Trend
Line chart:order_datevstotal_revenueTop Customers by LTV
Bar chart: fromdim_customersPayment Method Breakdown
Pie chart: % split by methodCustomer Growth Over Time
Area chart: cumulative count of customersOrder Heatmap
Heatmap: by hour of day and weekday
- Set up scheduled refresh in Metabase or schedule
dbt runanddbt buildusing cron/GitHub Actions
โ Deliverables
- Complete dbt project with:
- Staging and dimensional models
- Tests and documentation
- Metabase dashboard with:
- 3โ5 charts showing core business metrics
- Screenshots of:
- Metabase dashboards
- dbt docs lineage graph
- A
README.mdexplaining:- Dataset
- Architecture
- Setup instructions
- Key metrics
๐ Optional Extensions
- Add
exposures.ymlto document dashboards as dbt exposures - Use
dbt seedto include lookup tables - Create a GitHub repo and integrate Git-based workflow
- Add CI/CD: GitHub Actions to run
dbt build + test + docson push