TL;DR: Star schemas use denormalized dimension tables for simpler queries and faster performance — they are the default choice for analytics data warehouses. Snowflake schemas normalize dimensions into hierarchies, reducing redundancy but adding query complexity. For most modern cloud warehouses, star schemas win because storage is cheap and query simplicity directly impacts team productivity.
Star Schema vs Snowflake Schema: Data Modeling for Analytics
Dimensional modeling is the foundation of every analytics data warehouse. Whether you're building dashboards in Looker, running ad-hoc queries in BigQuery, or designing a dbt project, the schema you choose directly impacts query performance, data quality, and how easily your team can work with the data. This guide covers star schemas, snowflake schemas, and everything you need to model data for analytics effectively.
Why Dimensional Modeling Matters
Operational databases (OLTP) use normalized schemas — third normal form (3NF) — to minimize data redundancy and ensure consistency for transactional workloads. But normalized schemas are terrible for analytics. A simple business question like "What was total revenue by product category last quarter?" might require joining 8-10 tables in a normalized schema.
Dimensional modeling reorganizes data into a structure optimized for analytical queries: few joins, fast aggregations, and intuitive navigation. The two primary dimensional models are the star schema and the snowflake schema.
The Kimball vs Inmon Debate
Before diving into schemas, it's worth understanding the two major philosophies of data warehouse design.
Ralph Kimball's Approach (Bottom-Up)
Kimball advocates building dimensional data marts first — focused on individual business processes (sales, inventory, customer service). These are then integrated through conformed dimensions (shared dimension tables used across marts).
- Build data marts around business processes
- Use star schemas with denormalized dimensions
- Faster time to value — deliver a working mart in weeks
- The data warehouse is the union of all data marts
Bill Inmon's Approach (Top-Down)
Inmon advocates building a centralized, normalized data warehouse first (the enterprise data warehouse or EDW), then deriving dimensional data marts from it.
- Build the enterprise model first (3NF)
- Derive dimensional marts from the central warehouse
- More upfront effort, but a single source of truth
- Better consistency across the organization
In practice, most modern teams lean toward Kimball's approach because it's more pragmatic and delivers value incrementally. With tools like dbt, you can build dimensional models iteratively and refactor as the organization's needs evolve.
Fact Tables and Dimension Tables
Every dimensional model consists of two types of tables.
Fact Tables
Fact tables store measurements — the quantitative data that the business wants to analyze. Each row represents a business event or transaction.
Characteristics:
- Contain numeric, additive measures (revenue, quantity, cost)
- Include foreign keys to dimension tables
- Typically the largest tables in the warehouse (millions to billions of rows)
- Narrow (few columns relative to row count)
-- Example: Sales fact table
CREATE TABLE fct_sales (
sale_id BIGINT PRIMARY KEY,
date_key INTEGER REFERENCES dim_date(date_key),
product_key INTEGER REFERENCES dim_product(product_key),
customer_key INTEGER REFERENCES dim_customer(customer_key),
store_key INTEGER REFERENCES dim_store(store_key),
promotion_key INTEGER REFERENCES dim_promotion(promotion_key),
quantity_sold INTEGER,
unit_price DECIMAL(10, 2),
discount_amount DECIMAL(10, 2),
sales_amount DECIMAL(12, 2),
cost_amount DECIMAL(12, 2),
profit_amount DECIMAL(12, 2)
);
Types of fact tables:
- Transaction facts: One row per event (each sale, each click). Most common.
- Periodic snapshot facts: One row per entity per time period (daily inventory levels).
- Accumulating snapshot facts: One row per entity lifecycle (order lifecycle from placement to delivery).
Dimension Tables
Dimension tables store descriptive attributes — the "who, what, where, when, why, how" context for the facts.
Characteristics:
- Contain textual, descriptive attributes
- Relatively small (thousands to millions of rows)
- Wide (many columns describing the entity)
- Change slowly over time (hence "slowly changing dimensions")
-- Example: Product dimension table
CREATE TABLE dim_product (
product_key INTEGER PRIMARY KEY, -- Surrogate key
product_id VARCHAR(20), -- Natural/business key
product_name VARCHAR(200),
brand VARCHAR(100),
category VARCHAR(100),
subcategory VARCHAR(100),
size VARCHAR(50),
color VARCHAR(50),
unit_cost DECIMAL(10, 2),
is_active BOOLEAN,
effective_date DATE,
expiration_date DATE
);
The date dimension is special — almost every fact table references it, and it's pre-populated with calendar attributes:
-- Example: Date dimension table
CREATE TABLE dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD format
full_date DATE,
day_of_week VARCHAR(10),
day_of_month INTEGER,
day_of_year INTEGER,
week_of_year INTEGER,
month_number INTEGER,
month_name VARCHAR(10),
quarter INTEGER,
year INTEGER,
is_weekend BOOLEAN,
is_holiday BOOLEAN,
fiscal_quarter INTEGER,
fiscal_year INTEGER
);
The Star Schema
A star schema arranges data with a central fact table surrounded by denormalized dimension tables. When diagrammed, it looks like a star — the fact table at the center with dimensions radiating outward.
Structure
dim_date
|
dim_product --- fct_sales --- dim_customer
|
dim_store
Each dimension table connects directly to the fact table with a single join. Dimensions are denormalized — all descriptive attributes are flattened into a single table, even if that means some redundancy.
Example Query
The beauty of the star schema is query simplicity. A complex business question requires minimal joins:
-- Total revenue by product category and quarter for 2025
SELECT
d.quarter,
d.year,
p.category,
p.brand,
SUM(f.sales_amount) AS total_revenue,
SUM(f.profit_amount) AS total_profit,
COUNT(DISTINCT f.customer_key) AS unique_customers,
SUM(f.quantity_sold) AS units_sold
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.year = 2025
GROUP BY d.quarter, d.year, p.category, p.brand
ORDER BY d.quarter, total_revenue DESC;
Only two joins to answer a meaningful business question. That's the power of dimensional modeling.
Pros of Star Schema
- Simple queries: Fewer joins make SQL easier to write and understand
- Fast query performance: Query optimizers handle star joins extremely well
- BI tool friendly: Tools like Looker, Tableau, and Power BI are designed for star schemas
- Intuitive: Business users can understand the model without training
Cons of Star Schema
- Data redundancy: Denormalized dimensions store repeated data (e.g., "Electronics" category repeated for every product in that category)
- Larger dimension tables: More storage for redundant attributes
- Update complexity: Changing a category name requires updating many rows
The Snowflake Schema
A snowflake schema normalizes dimension tables into multiple related tables. Instead of a flat dim_product with category and subcategory columns, you'd have separate dim_category and dim_subcategory tables.
Structure
dim_category dim_date
| |
dim_subcategory |
| |
dim_product --- fct_sales ---+--- dim_customer --- dim_geography
| |
dim_store dim_region
The dimensions "snowflake out" into normalized sub-tables, resembling a snowflake shape.
Example: Normalized Product Dimension
-- Snowflake schema: Normalized product dimension
CREATE TABLE dim_category (
category_key INTEGER PRIMARY KEY,
category_name VARCHAR(100)
);
CREATE TABLE dim_subcategory (
subcategory_key INTEGER PRIMARY KEY,
subcategory_name VARCHAR(100),
category_key INTEGER REFERENCES dim_category(category_key)
);
CREATE TABLE dim_product (
product_key INTEGER PRIMARY KEY,
product_id VARCHAR(20),
product_name VARCHAR(200),
brand VARCHAR(100),
size VARCHAR(50),
color VARCHAR(50),
unit_cost DECIMAL(10, 2),
subcategory_key INTEGER REFERENCES dim_subcategory(subcategory_key)
);
Example Query
The same business question now requires more joins:
-- Same query as before, but with snowflake schema
SELECT
d.quarter,
d.year,
c.category_name,
p.brand,
SUM(f.sales_amount) AS total_revenue,
SUM(f.profit_amount) AS total_profit,
COUNT(DISTINCT f.customer_key) AS unique_customers,
SUM(f.quantity_sold) AS units_sold
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_subcategory sc ON p.subcategory_key = sc.subcategory_key
JOIN dim_category c ON sc.category_key = c.category_key
WHERE d.year = 2025
GROUP BY d.quarter, d.year, c.category_name, p.brand
ORDER BY d.quarter, total_revenue DESC;
Four joins instead of two. For this single query, the difference is minor. But across hundreds of queries and dashboards, the additional complexity adds up.
Pros of Snowflake Schema
- Less data redundancy: Normalized dimensions eliminate repeated data
- Smaller storage: Less disk space for dimension tables
- Easier dimension updates: Change a category name in one row, one table
- Enforced referential integrity: Normalization prevents inconsistencies
Cons of Snowflake Schema
- More complex queries: Additional joins make SQL harder to write and debug
- Slower queries: More joins can impact performance (though modern optimizers mitigate this)
- BI tool friction: Some tools struggle with multi-level dimension hierarchies
- Harder to understand: Business users find normalized structures confusing
Star vs Snowflake: Side-by-Side
| Criteria | Star Schema | Snowflake Schema |
|---|---|---|
| Dimension structure | Denormalized (flat) | Normalized (hierarchical) |
| Number of joins | Fewer | More |
| Query complexity | Simple | Complex |
| Query performance | Generally faster | Generally slower |
| Storage efficiency | Less efficient (redundancy) | More efficient |
| BI tool compatibility | Excellent | Good but more complex |
| Maintenance | Harder to update dimensions | Easier to update dimensions |
| Data integrity | Relies on ETL process | Enforced by normalization |
The Verdict
Star schemas are the standard choice for analytics data warehouses. The minor storage savings of snowflake schemas are not worth the query complexity, especially with modern columnar warehouses where storage is cheap and query simplicity directly impacts productivity.
Use a snowflake schema only when:
- Dimension tables are extremely large (millions of rows) and redundancy causes real storage concerns
- You need strict referential integrity that normalization enforces
- Your primary consumers are data engineers writing complex SQL, not analysts using BI tools
Slowly Changing Dimensions (SCD)
Dimensions change over time. A customer moves to a new city. A product's price changes. A store gets reclassified to a different region. Slowly Changing Dimensions (SCDs) define how your model handles these changes.
SCD Type 0: Fixed
The dimension value never changes. Once set, it stays forever.
Use case: Original signup date, birth date, natural keys.
SCD Type 1: Overwrite
The old value is overwritten with the new value. History is lost.
-- SCD Type 1: Simply update the record
UPDATE dim_customer
SET city = 'San Francisco', state = 'CA'
WHERE customer_id = 'C-1234';
Use case: Correcting data entry errors, attributes where history doesn't matter.
Trade-off: Simple but you lose all historical context. If a customer was in New York when they made purchases in Q1, but moved to San Francisco in Q2, all their historical purchases now show San Francisco.
SCD Type 2: Add New Row
A new row is inserted with effective dates. The old row is marked as expired. This preserves full history.
-- Before the change:
-- customer_key | customer_id | city | effective_date | expiration_date | is_current
-- 1001 | C-1234 | New York | 2024-01-15 | 9999-12-31 | TRUE
-- After the change:
UPDATE dim_customer
SET expiration_date = '2026-03-07', is_current = FALSE
WHERE customer_id = 'C-1234' AND is_current = TRUE;
INSERT INTO dim_customer
(customer_key, customer_id, city, effective_date, expiration_date, is_current)
VALUES
(1002, 'C-1234', 'San Francisco', '2026-03-08', '9999-12-31', TRUE);
Now the fact table can join to either version:
- Q1 sales join to
customer_key = 1001(New York) - Q2 sales join to
customer_key = 1002(San Francisco)
Use case: When you need to analyze facts in the context of the dimension as it was at the time of the event.
Trade-off: More complex to implement and query. Dimension tables grow larger. Surrogate keys are required.
SCD Type 3: Add New Column
Instead of a new row, add a column to store the previous value.
ALTER TABLE dim_customer ADD COLUMN previous_city VARCHAR(100);
UPDATE dim_customer
SET previous_city = city, city = 'San Francisco'
WHERE customer_id = 'C-1234';
Use case: When you only need to track one level of history (current vs previous).
Trade-off: Only stores one prior version. Not suitable when values change frequently.
Choosing an SCD Type
| SCD Type | History Preserved | Complexity | Storage Impact | Best For |
|---|---|---|---|---|
| Type 0 | None (fixed) | Minimal | None | Immutable attributes |
| Type 1 | None (overwrite) | Low | None | Error corrections |
| Type 2 | Full history | High | Rows grow | Time-sensitive analysis |
| Type 3 | One prior value | Medium | Columns grow | Before/after comparisons |
In practice, most warehouses use a combination: Type 2 for business-critical dimensions (customer location, product category) and Type 1 for non-critical attributes (name corrections, formatting fixes).
Practical Example: Building a Star Schema with dbt
Here's how you might implement a star schema using dbt in a modern ELT pipeline.
Staging Layer: Clean Raw Data
-- models/staging/stg_orders.sql
WITH source AS (
SELECT * FROM {{ source('raw', 'orders') }}
)
SELECT
id AS order_id,
customer_id,
product_id,
store_id,
CAST(order_date AS DATE) AS order_date,
CAST(quantity AS INTEGER) AS quantity,
CAST(unit_price AS DECIMAL(10,2)) AS unit_price,
CAST(discount AS DECIMAL(10,2)) AS discount,
quantity * unit_price AS gross_amount,
quantity * unit_price - discount AS net_amount
FROM source
WHERE id IS NOT NULL
Dimension Models
-- models/marts/dim_customer.sql
WITH customers AS (
SELECT * FROM {{ ref('stg_customers') }}
),
addresses AS (
SELECT * FROM {{ ref('stg_addresses') }}
)
SELECT
{{ dbt_utils.generate_surrogate_key(['c.customer_id']) }} AS customer_key,
c.customer_id,
c.first_name,
c.last_name,
c.email,
c.signup_date,
a.city,
a.state,
a.country,
a.postal_code,
CASE
WHEN c.signup_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'New'
WHEN c.total_orders >= 10 THEN 'Loyal'
WHEN c.last_order_date < CURRENT_DATE - INTERVAL '180 days' THEN 'Churned'
ELSE 'Active'
END AS customer_segment
FROM customers c
LEFT JOIN addresses a ON c.address_id = a.address_id
Fact Model
-- models/marts/fct_orders.sql
SELECT
o.order_id,
dc.customer_key,
dp.product_key,
ds.store_key,
dd.date_key,
o.quantity,
o.unit_price,
o.discount,
o.gross_amount,
o.net_amount,
o.net_amount - (o.quantity * dp.unit_cost) AS profit
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_customer') }} dc ON o.customer_id = dc.customer_id
JOIN {{ ref('dim_product') }} dp ON o.product_id = dp.product_id
JOIN {{ ref('dim_store') }} ds ON o.store_id = ds.store_id
JOIN {{ ref('dim_date') }} dd ON o.order_date = dd.full_date
For hands-on practice building dimensional models, try our Analytics Engineering with dbt project. For foundational modeling concepts, see our Data Modeling and SQL Fundamentals guides.
Performance Optimization Tips
1. Partition Fact Tables
Partition your fact table by date to enable partition pruning — the query engine skips irrelevant partitions entirely.
-- BigQuery: Partition by order date
CREATE TABLE fct_sales
PARTITION BY DATE(order_date)
CLUSTER BY product_key, store_key
AS SELECT * FROM staging.sales;
2. Use Surrogate Keys
Surrogate keys (auto-incrementing integers or hashes) are smaller and faster to join on than natural keys (strings like email addresses or product codes).
3. Pre-Aggregate Common Queries
If your dashboards always query daily revenue, create a pre-aggregated daily summary table instead of scanning the transaction-level fact table every time.
4. Materialize Strategically
In dbt, use table materialization for large fact tables and frequently queried dimensions. Use view for staging models that are only read by downstream models.
Key Takeaways
- Dimensional modeling organizes data into fact tables (measurements) and dimension tables (descriptive context) for efficient analytics.
- Star schemas denormalize dimensions into flat tables — simpler queries, faster performance, better BI tool compatibility.
- Snowflake schemas normalize dimensions into hierarchies — less redundancy but more complex queries.
- Star schemas are the default choice for modern analytics warehouses. Use snowflake schemas only when storage is a real concern.
- SCD Type 2 preserves full history and is essential for time-sensitive analysis. Use Type 1 for non-critical attributes.
- Kimball's approach (build dimensional marts iteratively) aligns well with modern dbt-based workflows.
- Performance depends on partitioning, clustering, and materialization — not just the schema design.
Good dimensional modeling is a skill that compounds over time. Every warehouse you build on a solid star schema foundation will be easier to query, maintain, and extend. Invest in getting the model right, and the rest of your analytics platform becomes dramatically simpler.
Frequently Asked Questions
What is a star schema?
A star schema is a dimensional data model where a central fact table (containing measurements like revenue and quantity) is surrounded by denormalized dimension tables (containing descriptive attributes like customer name, product category, and date). It is called a "star" because the diagram resembles a star shape. Star schemas are the standard choice for analytics data warehouses due to their query simplicity and BI tool compatibility.
What is a snowflake schema?
A snowflake schema is a dimensional data model where dimension tables are normalized into multiple related sub-tables. For example, instead of a flat product dimension with category and subcategory columns, you would have separate dim_category and dim_subcategory tables. This reduces data redundancy but requires more joins for queries, making SQL more complex.
What is a slowly changing dimension?
A Slowly Changing Dimension (SCD) defines how a dimensional model handles changes to dimension attributes over time. The most common types are: Type 1 (overwrite the old value, losing history), Type 2 (add a new row with effective dates, preserving full history), and Type 3 (add a column for the previous value, storing limited history). SCD Type 2 is the most important for time-sensitive analytics.
When should I use star schema vs snowflake schema?
Use a star schema in most cases — it provides simpler queries, faster performance, and better BI tool compatibility. Use a snowflake schema only when dimension tables are extremely large and redundancy causes real storage concerns, when you need strict referential integrity enforced by normalization, or when your primary consumers are data engineers writing complex SQL rather than analysts using BI tools.