06-reference

book adwd ch8 how many fact tables 2026 04 13

Sun Apr 12 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Agile Data Warehouse Design (book) ·by Lawrence Corr, Jim Stagnitto

Chapter 8: How Many — Fact Tables and Measures — Corr & Stagnitto

Why this is in the vault

Fact table design is the structural decision that determines whether a dimensional model performs well or collapses under query load and ETL complexity. This chapter covers the three core fact table types, additivity rules that silently corrupt reports when violated, and aggregation patterns that remain relevant even in modern column-store engines. For Ray Data Co’s consulting practice, these patterns appear in virtually every client engagement — especially when building SaaS metric models like ARR waterfalls, where the choice between transaction facts, periodic snapshots, and accumulating snapshots directly determines whether reconciliation is even possible.

Key concepts

Fact table types

Three fact table types correspond to three kinds of business events:

Granularity

Granularity is the single most important fact table property — it defines what each row means. It can be expressed in business terms (“one row per order line item”) or dimensionally (by listing the granularity dimensions that uniquely identify each row). Fuzzy or mixed granularity makes a fact table impossible to build or use correctly. For accumulating and period-to-date snapshots, the granularity dimensions (GD columns) also serve as unique update indexes for efficient ETL.

Modeling evolving events

Complex business processes can be modeled as multi-verb evolving events using BEAM* event tables. The key modeling tool is the event timeline — a visual chronology of milestones that surfaces duration measures between them. Duration naming follows a formula: number milestones DT1-DTn chronologically, then define durations as differences (e.g., Packing Time = DT2 - DT1). This keeps definitions unambiguous and portable.

State counts (1/0 flags recording whether a milestone has been reached) and event counts (number of repeated milestones like shipments per order) are essential derived measures that simplify process queries. Corr emphasizes that state logic can be surprisingly complex — the test for “delivered” is not just whether a delivery date exists, but whether delivered quantity equals ordered quantity.

Fact types (additivity)

Heterogeneous facts

When product lines have fundamentally different measures (banking: checking vs. mortgage accounts), a monolithic fact table becomes sparse and hard to manage. The pattern is to split into a core fact table with shared measures and custom fact tables per product family with their exclusive facts. Both share conformed dimensions. This avoids the “fact rich but data poor” anti-pattern.

Factless fact tables

Some events have nothing to measure except their occurrence — seminar attendance, eligibility coverage. These are modeled as factless fact tables containing only dimension keys. A dummy additive fact (always 1) can be added to support aggregate navigation. Factless tables also serve as coverage tables that enable “what didn’t happen” analysis (promoted but unsold products, covered but unclaimed benefits).

Fact table optimization

Four levers for performance:

  1. Downsizing — use integer surrogate keys, date keys instead of datetime, move text into dimensions, limit history to what BI actually needs (start with the most recent data and work backwards).

  2. Indexing — bitmap indexes on foreign keys for star join optimization; unique GD-based indexes on accumulating/PTD snapshots for efficient ETL updates. Prioritize query indexes over ETL indexes when there is a trade-off.

  3. Partitioning — by date key range. Enables fast ETL (load into empty, un-indexed partitions then swap), history pruning (truncate a partition vs. row-by-row delete), and real-time support (un-indexed hot partitions for intraday trickle feeds, merged nightly).

  4. Aggregation — pre-summarized derived fact tables that act as materialized group-by indexes. Three types:

    • Lost dimension — drop dimensions entirely (simplest, just a GROUP BY).
    • Shrunken dimension — roll up dimensions to coarser grain (Date to Month, Store to Region). More complex but covers broader queries.
    • Collapsed dimension — pre-join selected dimension attributes into a denormalized summary table. Fastest to query but the wider rows can negate size savings.

    Target aggregates ~20x smaller than base tables. Use DBMS aggregate navigation so they remain invisible to users. Budget up to 100% storage overhead.

Drill-across

When BI users need to combine measures from multiple fact tables (cross-process analysis), directly joining the tables produces the fan trap / chasm trap — overcounting caused by M:M relationships between joined fact tables. The solution is drill-across queries: separate SELECT statements against each star schema, aggregated independently to conformed row headers, then merged via full outer join. This is the only safe way to combine measures from different grain fact tables.

Mapping against Ray Data Co

ARR waterfall relevance

The ARR waterfall is a textbook accumulating snapshot problem. A subscription contract moves through milestones — booking, activation, renewal, expansion, contraction, churn — each with its own timestamp and amount. The chapter’s guidance maps directly:

Broader consulting application