TL;DR: Star schemas use denormalized dimension tables for simpler queries and faster performance — they are the default choice for analytics data warehouses. Snowflake schemas normalize dimensions into hierarchies, reducing redundancy but adding query complexity. For most modern cloud warehouses, star schemas win because storage is cheap and query simplicity directly impacts team productivity.

Star Schema vs Snowflake Schema: Data Modeling for Analytics

Dimensional modeling is the foundation of every analytics data warehouse. Whether you're building dashboards in Looker, running ad-hoc queries in BigQuery, or designing a dbt project, the schema you choose directly impacts query performance, data quality, and how easily your team can work with the data. This guide covers star schemas, snowflake schemas, and everything you need to model data for analytics effectively.

Why Dimensional Modeling Matters

Operational databases (OLTP) use normalized schemas — third normal form (3NF) — to minimize data redundancy and ensure consistency for transactional workloads. But normalized schemas are terrible for analytics. A simple business question like "What was total revenue by product category last quarter?" might require joining 8-10 tables in a normalized schema.

Dimensional modeling reorganizes data into a structure optimized for analytical queries: few joins, fast aggregations, and intuitive navigation. The two primary dimensional models are the star schema and the snowflake schema.

The Kimball vs Inmon Debate

Before diving into schemas, it's worth understanding the two major philosophies of data warehouse design.

Ralph Kimball's Approach (Bottom-Up)

Kimball advocates building dimensional data marts first — focused on individual business processes (sales, inventory, customer service). These are then integrated through conformed dimensions (shared dimension tables used across marts).

Build data marts around business processes
Use star schemas with denormalized dimensions
Faster time to value — deliver a working mart in weeks
The data warehouse is the union of all data marts

Bill Inmon's Approach (Top-Down)

Inmon advocates building a centralized, normalized data warehouse first (the enterprise data warehouse or EDW), then deriving dimensional data marts from it.

Build the enterprise model first (3NF)
Derive dimensional marts from the central warehouse
More upfront effort, but a single source of truth
Better consistency across the organization

In practice, most modern teams lean toward Kimball's approach because it's more pragmatic and delivers value incrementally. With tools like dbt, you can build dimensional models iteratively and refactor as the organization's needs evolve.

Fact Tables and Dimension Tables

Every dimensional model consists of two types of tables.

Fact Tables

Fact tables store measurements — the quantitative data that the business wants to analyze. Each row represents a business event or transaction.

Characteristics:

Contain numeric, additive measures (revenue, quantity, cost)
Include foreign keys to dimension tables
Typically the largest tables in the warehouse (millions to billions of rows)
Narrow (few columns relative to row count)

-- Example: Sales fact table
CREATE TABLE fct_sales (
    sale_id         BIGINT PRIMARY KEY,
    date_key        INTEGER REFERENCES dim_date(date_key),
    product_key     INTEGER REFERENCES dim_product(product_key),
    customer_key    INTEGER REFERENCES dim_customer(customer_key),
    store_key       INTEGER REFERENCES dim_store(store_key),
    promotion_key   INTEGER REFERENCES dim_promotion(promotion_key),
    quantity_sold   INTEGER,
    unit_price      DECIMAL(10, 2),
    discount_amount DECIMAL(10, 2),
    sales_amount    DECIMAL(12, 2),
    cost_amount     DECIMAL(12, 2),
    profit_amount   DECIMAL(12, 2)
);

Types of fact tables:

Transaction facts: One row per event (each sale, each click). Most common.
Periodic snapshot facts: One row per entity per time period (daily inventory levels).
Accumulating snapshot facts: One row per entity lifecycle (order lifecycle from placement to delivery).

Dimension Tables

Dimension tables store descriptive attributes — the "who, what, where, when, why, how" context for the facts.

Characteristics:

Contain textual, descriptive attributes
Relatively small (thousands to millions of rows)
Wide (many columns describing the entity)
Change slowly over time (hence "slowly changing dimensions")

-- Example: Product dimension table
CREATE TABLE dim_product (
    product_key     INTEGER PRIMARY KEY,  -- Surrogate key
    product_id      VARCHAR(20),          -- Natural/business key
    product_name    VARCHAR(200),
    brand           VARCHAR(100),
    category        VARCHAR(100),
    subcategory     VARCHAR(100),
    size            VARCHAR(50),
    color           VARCHAR(50),
    unit_cost       DECIMAL(10, 2),
    is_active       BOOLEAN,
    effective_date  DATE,
    expiration_date DATE
);

The date dimension is special — almost every fact table references it, and it's pre-populated with calendar attributes:

-- Example: Date dimension table
CREATE TABLE dim_date (
    date_key           INTEGER PRIMARY KEY,  -- YYYYMMDD format
    full_date          DATE,
    day_of_week        VARCHAR(10),
    day_of_month       INTEGER,
    day_of_year        INTEGER,
    week_of_year       INTEGER,
    month_number       INTEGER,
    month_name         VARCHAR(10),
    quarter            INTEGER,
    year               INTEGER,
    is_weekend         BOOLEAN,
    is_holiday         BOOLEAN,
    fiscal_quarter     INTEGER,
    fiscal_year        INTEGER
);

The Star Schema

A star schema arranges data with a central fact table surrounded by denormalized dimension tables. When diagrammed, it looks like a star — the fact table at the center with dimensions radiating outward.

Structure

                  dim_date
                     |
dim_product --- fct_sales --- dim_customer
                     |
                 dim_store

Each dimension table connects directly to the fact table with a single join. Dimensions are denormalized — all descriptive attributes are flattened into a single table, even if that means some redundancy.

Example Query

The beauty of the star schema is query simplicity. A complex business question requires minimal joins:

-- Total revenue by product category and quarter for 2025
SELECT
    d.quarter,
    d.year,
    p.category,
    p.brand,
    SUM(f.sales_amount) AS total_revenue,
    SUM(f.profit_amount) AS total_profit,
    COUNT(DISTINCT f.customer_key) AS unique_customers,
    SUM(f.quantity_sold) AS units_sold
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.year = 2025
GROUP BY d.quarter, d.year, p.category, p.brand
ORDER BY d.quarter, total_revenue DESC;

Only two joins to answer a meaningful business question. That's the power of dimensional modeling.

Pros of Star Schema

Simple queries: Fewer joins make SQL easier to write and understand
Fast query performance: Query optimizers handle star joins extremely well
BI tool friendly: Tools like Looker, Tableau, and Power BI are designed for star schemas
Intuitive: Business users can understand the model without training

Cons of Star Schema

Data redundancy: Denormalized dimensions store repeated data (e.g., "Electronics" category repeated for every product in that category)
Larger dimension tables: More storage for redundant attributes
Update complexity: Changing a category name requires updating many rows

The Snowflake Schema

A snowflake schema normalizes dimension tables into multiple related tables. Instead of a flat dim_product with category and subcategory columns, you'd have separate dim_category and dim_subcategory tables.

Structure

dim_category              dim_date
     |                       |
dim_subcategory              |
     |                       |
dim_product --- fct_sales ---+--- dim_customer --- dim_geography
                     |                                  |
                 dim_store                          dim_region

The dimensions "snowflake out" into normalized sub-tables, resembling a snowflake shape.

Example: Normalized Product Dimension

-- Snowflake schema: Normalized product dimension
CREATE TABLE dim_category (
    category_key  INTEGER PRIMARY KEY,
    category_name VARCHAR(100)
);

CREATE TABLE dim_subcategory (
    subcategory_key  INTEGER PRIMARY KEY,
    subcategory_name VARCHAR(100),
    category_key     INTEGER REFERENCES dim_category(category_key)
);

CREATE TABLE dim_product (
    product_key     INTEGER PRIMARY KEY,
    product_id      VARCHAR(20),
    product_name    VARCHAR(200),
    brand           VARCHAR(100),
    size            VARCHAR(50),
    color           VARCHAR(50),
    unit_cost       DECIMAL(10, 2),
    subcategory_key INTEGER REFERENCES dim_subcategory(subcategory_key)
);

Example Query

The same business question now requires more joins:

-- Same query as before, but with snowflake schema
SELECT
    d.quarter,
    d.year,
    c.category_name,
    p.brand,
    SUM(f.sales_amount) AS total_revenue,
    SUM(f.profit_amount) AS total_profit,
    COUNT(DISTINCT f.customer_key) AS unique_customers,
    SUM(f.quantity_sold) AS units_sold
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_subcategory sc ON p.subcategory_key = sc.subcategory_key
JOIN dim_category c ON sc.category_key = c.category_key
WHERE d.year = 2025
GROUP BY d.quarter, d.year, c.category_name, p.brand
ORDER BY d.quarter, total_revenue DESC;

Four joins instead of two. For this single query, the difference is minor. But across hundreds of queries and dashboards, the additional complexity adds up.

Pros of Snowflake Schema

Less data redundancy: Normalized dimensions eliminate repeated data
Smaller storage: Less disk space for dimension tables
Easier dimension updates: Change a category name in one row, one table
Enforced referential integrity: Normalization prevents inconsistencies

Cons of Snowflake Schema

More complex queries: Additional joins make SQL harder to write and debug
Slower queries: More joins can impact performance (though modern optimizers mitigate this)
BI tool friction: Some tools struggle with multi-level dimension hierarchies
Harder to understand: Business users find normalized structures confusing

Star vs Snowflake: Side-by-Side

Criteria	Star Schema	Snowflake Schema
Dimension structure	Denormalized (flat)	Normalized (hierarchical)
Number of joins	Fewer	More
Query complexity	Simple	Complex
Query performance	Generally faster	Generally slower
Storage efficiency	Less efficient (redundancy)	More efficient
BI tool compatibility	Excellent	Good but more complex
Maintenance	Harder to update dimensions	Easier to update dimensions
Data integrity	Relies on ETL process	Enforced by normalization

The Verdict

Star schemas are the standard choice for analytics data warehouses. The minor storage savings of snowflake schemas are not worth the query complexity, especially with modern columnar warehouses where storage is cheap and query simplicity directly impacts productivity.

Use a snowflake schema only when:

Dimension tables are extremely large (millions of rows) and redundancy causes real storage concerns
You need strict referential integrity that normalization enforces
Your primary consumers are data engineers writing complex SQL, not analysts using BI tools

Slowly Changing Dimensions (SCD)

Dimensions change over time. A customer moves to a new city. A product's price changes. A store gets reclassified to a different region. Slowly Changing Dimensions (SCDs) define how your model handles these changes.

SCD Type 0: Fixed

The dimension value never changes. Once set, it stays forever.

Use case: Original signup date, birth date, natural keys.

SCD Type 1: Overwrite

The old value is overwritten with the new value. History is lost.

-- SCD Type 1: Simply update the record
UPDATE dim_customer
SET city = 'San Francisco', state = 'CA'
WHERE customer_id = 'C-1234';

Use case: Correcting data entry errors, attributes where history doesn't matter.

Trade-off: Simple but you lose all historical context. If a customer was in New York when they made purchases in Q1, but moved to San Francisco in Q2, all their historical purchases now show San Francisco.

SCD Type 2: Add New Row

A new row is inserted with effective dates. The old row is marked as expired. This preserves full history.

-- Before the change:
-- customer_key | customer_id | city     | effective_date | expiration_date | is_current
-- 1001         | C-1234      | New York | 2024-01-15     | 9999-12-31      | TRUE

-- After the change:
UPDATE dim_customer
SET expiration_date = '2026-03-07', is_current = FALSE
WHERE customer_id = 'C-1234' AND is_current = TRUE;

INSERT INTO dim_customer
    (customer_key, customer_id, city, effective_date, expiration_date, is_current)
VALUES
    (1002, 'C-1234', 'San Francisco', '2026-03-08', '9999-12-31', TRUE);

Now the fact table can join to either version:

Q1 sales join to customer_key = 1001 (New York)
Q2 sales join to customer_key = 1002 (San Francisco)

Use case: When you need to analyze facts in the context of the dimension as it was at the time of the event.

Trade-off: More complex to implement and query. Dimension tables grow larger. Surrogate keys are required.

SCD Type 3: Add New Column

Instead of a new row, add a column to store the previous value.

ALTER TABLE dim_customer ADD COLUMN previous_city VARCHAR(100);

UPDATE dim_customer
SET previous_city = city, city = 'San Francisco'
WHERE customer_id = 'C-1234';

Use case: When you only need to track one level of history (current vs previous).

Trade-off: Only stores one prior version. Not suitable when values change frequently.

Choosing an SCD Type

SCD Type	History Preserved	Complexity	Storage Impact	Best For
Type 0	None (fixed)	Minimal	None	Immutable attributes
Type 1	None (overwrite)	Low	None	Error corrections
Type 2	Full history	High	Rows grow	Time-sensitive analysis
Type 3	One prior value	Medium	Columns grow	Before/after comparisons

In practice, most warehouses use a combination: Type 2 for business-critical dimensions (customer location, product category) and Type 1 for non-critical attributes (name corrections, formatting fixes).

Practical Example: Building a Star Schema with dbt

Here's how you might implement a star schema using dbt in a modern ELT pipeline.

Staging Layer: Clean Raw Data

-- models/staging/stg_orders.sql
WITH source AS (
    SELECT * FROM {{ source('raw', 'orders') }}
)
SELECT
    id AS order_id,
    customer_id,
    product_id,
    store_id,
    CAST(order_date AS DATE) AS order_date,
    CAST(quantity AS INTEGER) AS quantity,
    CAST(unit_price AS DECIMAL(10,2)) AS unit_price,
    CAST(discount AS DECIMAL(10,2)) AS discount,
    quantity * unit_price AS gross_amount,
    quantity * unit_price - discount AS net_amount
FROM source
WHERE id IS NOT NULL

Dimension Models

-- models/marts/dim_customer.sql
WITH customers AS (
    SELECT * FROM {{ ref('stg_customers') }}
),
addresses AS (
    SELECT * FROM {{ ref('stg_addresses') }}
)
SELECT
    {{ dbt_utils.generate_surrogate_key(['c.customer_id']) }} AS customer_key,
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    c.signup_date,
    a.city,
    a.state,
    a.country,
    a.postal_code,
    CASE
        WHEN c.signup_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'New'
        WHEN c.total_orders >= 10 THEN 'Loyal'
        WHEN c.last_order_date < CURRENT_DATE - INTERVAL '180 days' THEN 'Churned'
        ELSE 'Active'
    END AS customer_segment
FROM customers c
LEFT JOIN addresses a ON c.address_id = a.address_id

Fact Model

-- models/marts/fct_orders.sql
SELECT
    o.order_id,
    dc.customer_key,
    dp.product_key,
    ds.store_key,
    dd.date_key,
    o.quantity,
    o.unit_price,
    o.discount,
    o.gross_amount,
    o.net_amount,
    o.net_amount - (o.quantity * dp.unit_cost) AS profit
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_customer') }} dc ON o.customer_id = dc.customer_id
JOIN {{ ref('dim_product') }} dp ON o.product_id = dp.product_id
JOIN {{ ref('dim_store') }} ds ON o.store_id = ds.store_id
JOIN {{ ref('dim_date') }} dd ON o.order_date = dd.full_date

For hands-on practice building dimensional models, try our Analytics Engineering with dbt project. For foundational modeling concepts, see our Data Modeling and SQL Fundamentals guides.

Performance Optimization Tips

1. Partition Fact Tables

Partition your fact table by date to enable partition pruning — the query engine skips irrelevant partitions entirely.

-- BigQuery: Partition by order date
CREATE TABLE fct_sales
PARTITION BY DATE(order_date)
CLUSTER BY product_key, store_key
AS SELECT * FROM staging.sales;

2. Use Surrogate Keys

Surrogate keys (auto-incrementing integers or hashes) are smaller and faster to join on than natural keys (strings like email addresses or product codes).

3. Pre-Aggregate Common Queries

If your dashboards always query daily revenue, create a pre-aggregated daily summary table instead of scanning the transaction-level fact table every time.

4. Materialize Strategically

In dbt, use table materialization for large fact tables and frequently queried dimensions. Use view for staging models that are only read by downstream models.

Key Takeaways

Dimensional modeling organizes data into fact tables (measurements) and dimension tables (descriptive context) for efficient analytics.
Star schemas denormalize dimensions into flat tables — simpler queries, faster performance, better BI tool compatibility.
Snowflake schemas normalize dimensions into hierarchies — less redundancy but more complex queries.
Star schemas are the default choice for modern analytics warehouses. Use snowflake schemas only when storage is a real concern.
SCD Type 2 preserves full history and is essential for time-sensitive analysis. Use Type 1 for non-critical attributes.
Kimball's approach (build dimensional marts iteratively) aligns well with modern dbt-based workflows.
Performance depends on partitioning, clustering, and materialization — not just the schema design.

Good dimensional modeling is a skill that compounds over time. Every warehouse you build on a solid star schema foundation will be easier to query, maintain, and extend. Invest in getting the model right, and the rest of your analytics platform becomes dramatically simpler.

Frequently Asked Questions

What is a star schema?

A star schema is a dimensional data model where a central fact table (containing measurements like revenue and quantity) is surrounded by denormalized dimension tables (containing descriptive attributes like customer name, product category, and date). It is called a "star" because the diagram resembles a star shape. Star schemas are the standard choice for analytics data warehouses due to their query simplicity and BI tool compatibility.

What is a snowflake schema?

A snowflake schema is a dimensional data model where dimension tables are normalized into multiple related sub-tables. For example, instead of a flat product dimension with category and subcategory columns, you would have separate dim_category and dim_subcategory tables. This reduces data redundancy but requires more joins for queries, making SQL more complex.

What is a slowly changing dimension?

A Slowly Changing Dimension (SCD) defines how a dimensional model handles changes to dimension attributes over time. The most common types are: Type 1 (overwrite the old value, losing history), Type 2 (add a new row with effective dates, preserving full history), and Type 3 (add a column for the previous value, storing limited history). SCD Type 2 is the most important for time-sensitive analytics.

When should I use star schema vs snowflake schema?

Use a star schema in most cases — it provides simpler queries, faster performance, and better BI tool compatibility. Use a snowflake schema only when dimension tables are extremely large and redundancy causes real storage concerns, when you need strict referential integrity enforced by normalization, or when your primary consumers are data engineers writing complex SQL rather than analysts using BI tools.

Star Schema vs Snowflake Schema: Data Modeling for Analytics

Why Dimensional Modeling Matters

The Kimball vs Inmon Debate

Ralph Kimball's Approach (Bottom-Up)

Bill Inmon's Approach (Top-Down)

Fact Tables and Dimension Tables

Fact Tables

Dimension Tables

The Star Schema

Structure

Example Query

Pros of Star Schema

Cons of Star Schema

The Snowflake Schema

Structure

Example: Normalized Product Dimension

Example Query

Pros of Snowflake Schema

Cons of Snowflake Schema

Star vs Snowflake: Side-by-Side

The Verdict

Slowly Changing Dimensions (SCD)

SCD Type 0: Fixed

SCD Type 1: Overwrite

SCD Type 2: Add New Row

SCD Type 3: Add New Column

Choosing an SCD Type

Practical Example: Building a Star Schema with dbt

Staging Layer: Clean Raw Data

Dimension Models

Fact Model

Performance Optimization Tips

1. Partition Fact Tables

2. Use Surrogate Keys

3. Pre-Aggregate Common Queries

4. Materialize Strategically

Key Takeaways

Frequently Asked Questions

What is a star schema?

What is a snowflake schema?

What is a slowly changing dimension?

When should I use star schema vs snowflake schema?

About the Author

Related Articles

ETL vs ELT: A Complete Guide for Data Engineers

Data Pipeline Design Patterns Every Engineer Should Know

Data Warehouse vs Data Lake vs Data Lakehouse: Choosing the Right Architecture

Ready to Apply What You Learned?