Star Schema vs Snowflake Schema: Data Modeling for Analytics

    Master dimensional modeling with star and snowflake schemas. Learn fact tables, dimension tables, SCD types, and when to use each approach.

    By Adriano Sanges--13 min read
    star schema
    snowflake schema
    dimensional modeling
    data modeling
    fact tables
    dimension tables
    Kimball

    TL;DR: Star schemas use denormalized dimension tables for simpler queries and faster performance — they are the default choice for analytics data warehouses. Snowflake schemas normalize dimensions into hierarchies, reducing redundancy but adding query complexity. For most modern cloud warehouses, star schemas win because storage is cheap and query simplicity directly impacts team productivity.

    Star Schema vs Snowflake Schema: Data Modeling for Analytics

    Dimensional modeling is the foundation of every analytics data warehouse. Whether you're building dashboards in Looker, running ad-hoc queries in BigQuery, or designing a dbt project, the schema you choose directly impacts query performance, data quality, and how easily your team can work with the data. This guide covers star schemas, snowflake schemas, and everything you need to model data for analytics effectively.

    Why Dimensional Modeling Matters

    Operational databases (OLTP) use normalized schemas — third normal form (3NF) — to minimize data redundancy and ensure consistency for transactional workloads. But normalized schemas are terrible for analytics. A simple business question like "What was total revenue by product category last quarter?" might require joining 8-10 tables in a normalized schema.

    Dimensional modeling reorganizes data into a structure optimized for analytical queries: few joins, fast aggregations, and intuitive navigation. The two primary dimensional models are the star schema and the snowflake schema.

    The Kimball vs Inmon Debate

    Before diving into schemas, it's worth understanding the two major philosophies of data warehouse design.

    Ralph Kimball's Approach (Bottom-Up)

    Kimball advocates building dimensional data marts first — focused on individual business processes (sales, inventory, customer service). These are then integrated through conformed dimensions (shared dimension tables used across marts).

    • Build data marts around business processes
    • Use star schemas with denormalized dimensions
    • Faster time to value — deliver a working mart in weeks
    • The data warehouse is the union of all data marts

    Bill Inmon's Approach (Top-Down)

    Inmon advocates building a centralized, normalized data warehouse first (the enterprise data warehouse or EDW), then deriving dimensional data marts from it.

    • Build the enterprise model first (3NF)
    • Derive dimensional marts from the central warehouse
    • More upfront effort, but a single source of truth
    • Better consistency across the organization

    In practice, most modern teams lean toward Kimball's approach because it's more pragmatic and delivers value incrementally. With tools like dbt, you can build dimensional models iteratively and refactor as the organization's needs evolve.

    Fact Tables and Dimension Tables

    Every dimensional model consists of two types of tables.

    Fact Tables

    Fact tables store measurements — the quantitative data that the business wants to analyze. Each row represents a business event or transaction.

    Characteristics:

    • Contain numeric, additive measures (revenue, quantity, cost)
    • Include foreign keys to dimension tables
    • Typically the largest tables in the warehouse (millions to billions of rows)
    • Narrow (few columns relative to row count)
    -- Example: Sales fact table
    CREATE TABLE fct_sales (
        sale_id         BIGINT PRIMARY KEY,
        date_key        INTEGER REFERENCES dim_date(date_key),
        product_key     INTEGER REFERENCES dim_product(product_key),
        customer_key    INTEGER REFERENCES dim_customer(customer_key),
        store_key       INTEGER REFERENCES dim_store(store_key),
        promotion_key   INTEGER REFERENCES dim_promotion(promotion_key),
        quantity_sold   INTEGER,
        unit_price      DECIMAL(10, 2),
        discount_amount DECIMAL(10, 2),
        sales_amount    DECIMAL(12, 2),
        cost_amount     DECIMAL(12, 2),
        profit_amount   DECIMAL(12, 2)
    );
    

    Types of fact tables:

    • Transaction facts: One row per event (each sale, each click). Most common.
    • Periodic snapshot facts: One row per entity per time period (daily inventory levels).
    • Accumulating snapshot facts: One row per entity lifecycle (order lifecycle from placement to delivery).

    Dimension Tables

    Dimension tables store descriptive attributes — the "who, what, where, when, why, how" context for the facts.

    Characteristics:

    • Contain textual, descriptive attributes
    • Relatively small (thousands to millions of rows)
    • Wide (many columns describing the entity)
    • Change slowly over time (hence "slowly changing dimensions")
    -- Example: Product dimension table
    CREATE TABLE dim_product (
        product_key     INTEGER PRIMARY KEY,  -- Surrogate key
        product_id      VARCHAR(20),          -- Natural/business key
        product_name    VARCHAR(200),
        brand           VARCHAR(100),
        category        VARCHAR(100),
        subcategory     VARCHAR(100),
        size            VARCHAR(50),
        color           VARCHAR(50),
        unit_cost       DECIMAL(10, 2),
        is_active       BOOLEAN,
        effective_date  DATE,
        expiration_date DATE
    );
    

    The date dimension is special — almost every fact table references it, and it's pre-populated with calendar attributes:

    -- Example: Date dimension table
    CREATE TABLE dim_date (
        date_key           INTEGER PRIMARY KEY,  -- YYYYMMDD format
        full_date          DATE,
        day_of_week        VARCHAR(10),
        day_of_month       INTEGER,
        day_of_year        INTEGER,
        week_of_year       INTEGER,
        month_number       INTEGER,
        month_name         VARCHAR(10),
        quarter            INTEGER,
        year               INTEGER,
        is_weekend         BOOLEAN,
        is_holiday         BOOLEAN,
        fiscal_quarter     INTEGER,
        fiscal_year        INTEGER
    );
    

    The Star Schema

    A star schema arranges data with a central fact table surrounded by denormalized dimension tables. When diagrammed, it looks like a star — the fact table at the center with dimensions radiating outward.

    Structure

                      dim_date
                         |
    dim_product --- fct_sales --- dim_customer
                         |
                     dim_store
    

    Each dimension table connects directly to the fact table with a single join. Dimensions are denormalized — all descriptive attributes are flattened into a single table, even if that means some redundancy.

    Example Query

    The beauty of the star schema is query simplicity. A complex business question requires minimal joins:

    -- Total revenue by product category and quarter for 2025
    SELECT
        d.quarter,
        d.year,
        p.category,
        p.brand,
        SUM(f.sales_amount) AS total_revenue,
        SUM(f.profit_amount) AS total_profit,
        COUNT(DISTINCT f.customer_key) AS unique_customers,
        SUM(f.quantity_sold) AS units_sold
    FROM fct_sales f
    JOIN dim_date d ON f.date_key = d.date_key
    JOIN dim_product p ON f.product_key = p.product_key
    WHERE d.year = 2025
    GROUP BY d.quarter, d.year, p.category, p.brand
    ORDER BY d.quarter, total_revenue DESC;
    

    Only two joins to answer a meaningful business question. That's the power of dimensional modeling.

    Pros of Star Schema

    • Simple queries: Fewer joins make SQL easier to write and understand
    • Fast query performance: Query optimizers handle star joins extremely well
    • BI tool friendly: Tools like Looker, Tableau, and Power BI are designed for star schemas
    • Intuitive: Business users can understand the model without training

    Cons of Star Schema

    • Data redundancy: Denormalized dimensions store repeated data (e.g., "Electronics" category repeated for every product in that category)
    • Larger dimension tables: More storage for redundant attributes
    • Update complexity: Changing a category name requires updating many rows

    The Snowflake Schema

    A snowflake schema normalizes dimension tables into multiple related tables. Instead of a flat dim_product with category and subcategory columns, you'd have separate dim_category and dim_subcategory tables.

    Structure

    dim_category              dim_date
         |                       |
    dim_subcategory              |
         |                       |
    dim_product --- fct_sales ---+--- dim_customer --- dim_geography
                         |                                  |
                     dim_store                          dim_region
    

    The dimensions "snowflake out" into normalized sub-tables, resembling a snowflake shape.

    Example: Normalized Product Dimension

    -- Snowflake schema: Normalized product dimension
    CREATE TABLE dim_category (
        category_key  INTEGER PRIMARY KEY,
        category_name VARCHAR(100)
    );
    
    CREATE TABLE dim_subcategory (
        subcategory_key  INTEGER PRIMARY KEY,
        subcategory_name VARCHAR(100),
        category_key     INTEGER REFERENCES dim_category(category_key)
    );
    
    CREATE TABLE dim_product (
        product_key     INTEGER PRIMARY KEY,
        product_id      VARCHAR(20),
        product_name    VARCHAR(200),
        brand           VARCHAR(100),
        size            VARCHAR(50),
        color           VARCHAR(50),
        unit_cost       DECIMAL(10, 2),
        subcategory_key INTEGER REFERENCES dim_subcategory(subcategory_key)
    );
    

    Example Query

    The same business question now requires more joins:

    -- Same query as before, but with snowflake schema
    SELECT
        d.quarter,
        d.year,
        c.category_name,
        p.brand,
        SUM(f.sales_amount) AS total_revenue,
        SUM(f.profit_amount) AS total_profit,
        COUNT(DISTINCT f.customer_key) AS unique_customers,
        SUM(f.quantity_sold) AS units_sold
    FROM fct_sales f
    JOIN dim_date d ON f.date_key = d.date_key
    JOIN dim_product p ON f.product_key = p.product_key
    JOIN dim_subcategory sc ON p.subcategory_key = sc.subcategory_key
    JOIN dim_category c ON sc.category_key = c.category_key
    WHERE d.year = 2025
    GROUP BY d.quarter, d.year, c.category_name, p.brand
    ORDER BY d.quarter, total_revenue DESC;
    

    Four joins instead of two. For this single query, the difference is minor. But across hundreds of queries and dashboards, the additional complexity adds up.

    Pros of Snowflake Schema

    • Less data redundancy: Normalized dimensions eliminate repeated data
    • Smaller storage: Less disk space for dimension tables
    • Easier dimension updates: Change a category name in one row, one table
    • Enforced referential integrity: Normalization prevents inconsistencies

    Cons of Snowflake Schema

    • More complex queries: Additional joins make SQL harder to write and debug
    • Slower queries: More joins can impact performance (though modern optimizers mitigate this)
    • BI tool friction: Some tools struggle with multi-level dimension hierarchies
    • Harder to understand: Business users find normalized structures confusing

    Star vs Snowflake: Side-by-Side

    Criteria Star Schema Snowflake Schema
    Dimension structure Denormalized (flat) Normalized (hierarchical)
    Number of joins Fewer More
    Query complexity Simple Complex
    Query performance Generally faster Generally slower
    Storage efficiency Less efficient (redundancy) More efficient
    BI tool compatibility Excellent Good but more complex
    Maintenance Harder to update dimensions Easier to update dimensions
    Data integrity Relies on ETL process Enforced by normalization

    The Verdict

    Star schemas are the standard choice for analytics data warehouses. The minor storage savings of snowflake schemas are not worth the query complexity, especially with modern columnar warehouses where storage is cheap and query simplicity directly impacts productivity.

    Use a snowflake schema only when:

    • Dimension tables are extremely large (millions of rows) and redundancy causes real storage concerns
    • You need strict referential integrity that normalization enforces
    • Your primary consumers are data engineers writing complex SQL, not analysts using BI tools

    Slowly Changing Dimensions (SCD)

    Dimensions change over time. A customer moves to a new city. A product's price changes. A store gets reclassified to a different region. Slowly Changing Dimensions (SCDs) define how your model handles these changes.

    SCD Type 0: Fixed

    The dimension value never changes. Once set, it stays forever.

    Use case: Original signup date, birth date, natural keys.

    SCD Type 1: Overwrite

    The old value is overwritten with the new value. History is lost.

    -- SCD Type 1: Simply update the record
    UPDATE dim_customer
    SET city = 'San Francisco', state = 'CA'
    WHERE customer_id = 'C-1234';
    

    Use case: Correcting data entry errors, attributes where history doesn't matter.

    Trade-off: Simple but you lose all historical context. If a customer was in New York when they made purchases in Q1, but moved to San Francisco in Q2, all their historical purchases now show San Francisco.

    SCD Type 2: Add New Row

    A new row is inserted with effective dates. The old row is marked as expired. This preserves full history.

    -- Before the change:
    -- customer_key | customer_id | city     | effective_date | expiration_date | is_current
    -- 1001         | C-1234      | New York | 2024-01-15     | 9999-12-31      | TRUE
    
    -- After the change:
    UPDATE dim_customer
    SET expiration_date = '2026-03-07', is_current = FALSE
    WHERE customer_id = 'C-1234' AND is_current = TRUE;
    
    INSERT INTO dim_customer
        (customer_key, customer_id, city, effective_date, expiration_date, is_current)
    VALUES
        (1002, 'C-1234', 'San Francisco', '2026-03-08', '9999-12-31', TRUE);
    

    Now the fact table can join to either version:

    • Q1 sales join to customer_key = 1001 (New York)
    • Q2 sales join to customer_key = 1002 (San Francisco)

    Use case: When you need to analyze facts in the context of the dimension as it was at the time of the event.

    Trade-off: More complex to implement and query. Dimension tables grow larger. Surrogate keys are required.

    SCD Type 3: Add New Column

    Instead of a new row, add a column to store the previous value.

    ALTER TABLE dim_customer ADD COLUMN previous_city VARCHAR(100);
    
    UPDATE dim_customer
    SET previous_city = city, city = 'San Francisco'
    WHERE customer_id = 'C-1234';
    

    Use case: When you only need to track one level of history (current vs previous).

    Trade-off: Only stores one prior version. Not suitable when values change frequently.

    Choosing an SCD Type

    SCD Type History Preserved Complexity Storage Impact Best For
    Type 0 None (fixed) Minimal None Immutable attributes
    Type 1 None (overwrite) Low None Error corrections
    Type 2 Full history High Rows grow Time-sensitive analysis
    Type 3 One prior value Medium Columns grow Before/after comparisons

    In practice, most warehouses use a combination: Type 2 for business-critical dimensions (customer location, product category) and Type 1 for non-critical attributes (name corrections, formatting fixes).

    Practical Example: Building a Star Schema with dbt

    Here's how you might implement a star schema using dbt in a modern ELT pipeline.

    Staging Layer: Clean Raw Data

    -- models/staging/stg_orders.sql
    WITH source AS (
        SELECT * FROM {{ source('raw', 'orders') }}
    )
    SELECT
        id AS order_id,
        customer_id,
        product_id,
        store_id,
        CAST(order_date AS DATE) AS order_date,
        CAST(quantity AS INTEGER) AS quantity,
        CAST(unit_price AS DECIMAL(10,2)) AS unit_price,
        CAST(discount AS DECIMAL(10,2)) AS discount,
        quantity * unit_price AS gross_amount,
        quantity * unit_price - discount AS net_amount
    FROM source
    WHERE id IS NOT NULL
    

    Dimension Models

    -- models/marts/dim_customer.sql
    WITH customers AS (
        SELECT * FROM {{ ref('stg_customers') }}
    ),
    addresses AS (
        SELECT * FROM {{ ref('stg_addresses') }}
    )
    SELECT
        {{ dbt_utils.generate_surrogate_key(['c.customer_id']) }} AS customer_key,
        c.customer_id,
        c.first_name,
        c.last_name,
        c.email,
        c.signup_date,
        a.city,
        a.state,
        a.country,
        a.postal_code,
        CASE
            WHEN c.signup_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'New'
            WHEN c.total_orders >= 10 THEN 'Loyal'
            WHEN c.last_order_date < CURRENT_DATE - INTERVAL '180 days' THEN 'Churned'
            ELSE 'Active'
        END AS customer_segment
    FROM customers c
    LEFT JOIN addresses a ON c.address_id = a.address_id
    

    Fact Model

    -- models/marts/fct_orders.sql
    SELECT
        o.order_id,
        dc.customer_key,
        dp.product_key,
        ds.store_key,
        dd.date_key,
        o.quantity,
        o.unit_price,
        o.discount,
        o.gross_amount,
        o.net_amount,
        o.net_amount - (o.quantity * dp.unit_cost) AS profit
    FROM {{ ref('stg_orders') }} o
    JOIN {{ ref('dim_customer') }} dc ON o.customer_id = dc.customer_id
    JOIN {{ ref('dim_product') }} dp ON o.product_id = dp.product_id
    JOIN {{ ref('dim_store') }} ds ON o.store_id = ds.store_id
    JOIN {{ ref('dim_date') }} dd ON o.order_date = dd.full_date
    

    For hands-on practice building dimensional models, try our Analytics Engineering with dbt project. For foundational modeling concepts, see our Data Modeling and SQL Fundamentals guides.

    Performance Optimization Tips

    1. Partition Fact Tables

    Partition your fact table by date to enable partition pruning — the query engine skips irrelevant partitions entirely.

    -- BigQuery: Partition by order date
    CREATE TABLE fct_sales
    PARTITION BY DATE(order_date)
    CLUSTER BY product_key, store_key
    AS SELECT * FROM staging.sales;
    

    2. Use Surrogate Keys

    Surrogate keys (auto-incrementing integers or hashes) are smaller and faster to join on than natural keys (strings like email addresses or product codes).

    3. Pre-Aggregate Common Queries

    If your dashboards always query daily revenue, create a pre-aggregated daily summary table instead of scanning the transaction-level fact table every time.

    4. Materialize Strategically

    In dbt, use table materialization for large fact tables and frequently queried dimensions. Use view for staging models that are only read by downstream models.

    Key Takeaways

    1. Dimensional modeling organizes data into fact tables (measurements) and dimension tables (descriptive context) for efficient analytics.
    2. Star schemas denormalize dimensions into flat tables — simpler queries, faster performance, better BI tool compatibility.
    3. Snowflake schemas normalize dimensions into hierarchies — less redundancy but more complex queries.
    4. Star schemas are the default choice for modern analytics warehouses. Use snowflake schemas only when storage is a real concern.
    5. SCD Type 2 preserves full history and is essential for time-sensitive analysis. Use Type 1 for non-critical attributes.
    6. Kimball's approach (build dimensional marts iteratively) aligns well with modern dbt-based workflows.
    7. Performance depends on partitioning, clustering, and materialization — not just the schema design.

    Good dimensional modeling is a skill that compounds over time. Every warehouse you build on a solid star schema foundation will be easier to query, maintain, and extend. Invest in getting the model right, and the rest of your analytics platform becomes dramatically simpler.

    Frequently Asked Questions

    What is a star schema?

    A star schema is a dimensional data model where a central fact table (containing measurements like revenue and quantity) is surrounded by denormalized dimension tables (containing descriptive attributes like customer name, product category, and date). It is called a "star" because the diagram resembles a star shape. Star schemas are the standard choice for analytics data warehouses due to their query simplicity and BI tool compatibility.

    What is a snowflake schema?

    A snowflake schema is a dimensional data model where dimension tables are normalized into multiple related sub-tables. For example, instead of a flat product dimension with category and subcategory columns, you would have separate dim_category and dim_subcategory tables. This reduces data redundancy but requires more joins for queries, making SQL more complex.

    What is a slowly changing dimension?

    A Slowly Changing Dimension (SCD) defines how a dimensional model handles changes to dimension attributes over time. The most common types are: Type 1 (overwrite the old value, losing history), Type 2 (add a new row with effective dates, preserving full history), and Type 3 (add a column for the previous value, storing limited history). SCD Type 2 is the most important for time-sensitive analytics.

    When should I use star schema vs snowflake schema?

    Use a star schema in most cases — it provides simpler queries, faster performance, and better BI tool compatibility. Use a snowflake schema only when dimension tables are extremely large and redundancy causes real storage concerns, when you need strict referential integrity enforced by normalization, or when your primary consumers are data engineers writing complex SQL rather than analysts using BI tools.

    About the Author

    Adriano Sanges is a data engineering professional and the creator of dataskew.io. With years of experience building data platforms at scale, he shares practical insights and hands-on guides to help aspiring data engineers advance their careers.

    Ready to Apply What You Learned?

    Take the next step in your data engineering journey with structured roadmaps and hands-on projects designed for real-world experience.