06-reference

book adwd ch1 how to model dw 2026 04 13

Sun Apr 12 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Agile Data Warehouse Design (book) ·by Lawrence Corr, Jim Stagnitto

“Chapter 1: How to Model a Data Warehouse” — Corr & Stagnitto

Why this is in the vault

This chapter provides the theoretical foundation for BEAM (Business Event Analysis & Modeling), the agile dimensional modeling method the book is built around. RDCO needs this because:

  1. Testing matrix grounding. Our Scope x Basis testing framework assumes the data under test lives in dimensional models. Corr’s chapter articulates why dimensional models exist and how they differ from OLTP schemas — the same distinction that makes our testing matrix necessary in the first place. You cannot define meaningful data quality checks without understanding fact/dimension separation and grain.

  2. MG harness positioning. The Mammoth Growth harness review identified a TDD gap. Corr explicitly advocates targeted data profiling as the agile alternative to exhaustive source analysis, and frames ETL test-driven development as a prerequisite for agile DW delivery. This validates the gap we flagged and gives us vocabulary for the fix.

  3. Consulting methodology. Corr’s argument that modeling should be collaborative, stakeholder-driven, and iterative maps directly to how RDCO runs client engagements — short discovery cycles, working prototypes, not waterfall requirements documents. The BEAM modelstorming technique (fast, whiteboard-based sessions with business users) is a concrete method we can reference or adopt.

Key concepts

OLTP vs. DW/BI: two different worlds

Operational systems optimize for transaction throughput (inserts, updates, deletes); data warehouses optimize for query performance and usability (selects, aggregations, historical comparisons). The chapter provides a crisp comparison table: OLTP uses ER modeling and 3NF; DW/BI uses dimensional modeling and star schemas. These are not competing approaches — they serve fundamentally different purposes.

The case against ER modeling for analytics

3NF is efficient for writes but hostile to reads. Normalization proliferates tables and join paths, making queries slow, hard to write correctly, and nearly impossible for business users to verify. History tracking compounds the problem by turning simple 1:M relationships into M:M relationships, adding even more physical tables. Corr argues that ER diagrams are visually overwhelming at warehouse scale, which blocks stakeholder collaboration.

The case for dimensional modeling

Dimensional models define business processes as measurable events (facts) surrounded by descriptive context (dimensions). Star schemas — a central fact table joined to dimension tables — minimize joins, maximize query performance, and are intuitive enough for business users to read. The key structural insight: fact tables contain numeric measures of business events; dimensions contain the textual attributes used to filter, group, and describe those measures.

The 7Ws framework

An extension of the journalist’s 5Ws, the 7Ws are the interrogatives that structure every dimensional model:

Fact tables represent verbs (business activity); dimensions are nouns, each classifiable as one of the 7Ws. Star schemas typically contain 8-20 dimensions because each W can appear multiple times (e.g., an order fulfillment event might have three who dimensions: customer, employee, carrier).

Data warehouse analysis approaches

Corr identifies two traditional analysis methods, both with significant limitations for proactive DW design:

The chapter frames this as a “chicken or the egg” problem that traditional methods cannot solve.

Agile data warehouse design

Corr contrasts waterfall DW development (Big Design Up Front / BDUF) with agile delivery. The minimum deliverable unit in agile DW is a single star schema — queryable tables, ETL to populate them, and a BI tool to access them. The book uses JEDUF (Just Enough Design Up Front) for cross-iteration planning and JIT (Just-In-Time) detail modeling within each sprint.

BEAM introduction

BEAM (Business Event Analysis & Modeling) is the book’s core contribution. It combines:

The asterisk in BEAM represents star schemas — the dimensional deliverable the method produces.

Agile dimensional modeling benefits

Corr lists six benefits of applying agile methods to dimensional modeling: avoids analysis paralysis by focusing on business processes rather than reports; produces flexible report-neutral designs; enables proactive influence on operational system development; supports accretive requirements through evolutionary iteration; teaches stakeholders to think dimensionally; and creates stakeholder ownership of the resulting data models.

Mapping against Ray Data Co

Testing matrix. The Scope x Basis framework assumes dimensional structure. Corr’s chapter clarifies why: fact tables have grain (a specific level of detail), and dimensions have hierarchies. Both are testable properties. Scope (row-level, column-level, cross-table) maps naturally to the fact/dimension/grain structure Corr describes. Basis (absolute, relative, temporal) maps to the kinds of assertions you can make about measures vs. attributes. The testing matrix is more powerful when teams understand the dimensional model underneath.

MG harness review. The harness currently lacks a formal approach to test-driven ETL development. Corr explicitly names targeted data profiling and ETL TDD as prerequisites for agile DW work — and warns that modeling what cannot be tested is wasted effort. This gives us a specific recommendation to bring back to MG: adopt BEAM-style profiling as a pre-sprint activity that feeds directly into dbt test definitions.

Consulting positioning. Corr’s collaborative modeling philosophy aligns with RDCO’s engagement style. The modelstorming technique — short, stakeholder-led sessions that produce immediately usable artifacts — is how we already run discovery. We can formalize this by adopting BEAM table notation as a deliverable format, giving clients something more structured than meeting notes but more accessible than ER diagrams.

Kimball alignment. This chapter is a companion to the Kimball Toolkit reference already in the vault. Where Kimball defines the canonical patterns (conformed dimensions, bus architecture, SCD types), Corr provides the process for discovering and documenting those patterns collaboratively. The two books cover design vs. discovery.