“Chapter 1: How to Model a Data Warehouse” — Corr & Stagnitto
Why this is in the vault
This chapter provides the theoretical foundation for BEAM (Business Event Analysis & Modeling), the agile dimensional modeling method the book is built around. RDCO needs this because:
-
Testing matrix grounding. Our Scope x Basis testing framework assumes the data under test lives in dimensional models. Corr’s chapter articulates why dimensional models exist and how they differ from OLTP schemas — the same distinction that makes our testing matrix necessary in the first place. You cannot define meaningful data quality checks without understanding fact/dimension separation and grain.
-
MG harness positioning. The Mammoth Growth harness review identified a TDD gap. Corr explicitly advocates targeted data profiling as the agile alternative to exhaustive source analysis, and frames ETL test-driven development as a prerequisite for agile DW delivery. This validates the gap we flagged and gives us vocabulary for the fix.
-
Consulting methodology. Corr’s argument that modeling should be collaborative, stakeholder-driven, and iterative maps directly to how RDCO runs client engagements — short discovery cycles, working prototypes, not waterfall requirements documents. The BEAM modelstorming technique (fast, whiteboard-based sessions with business users) is a concrete method we can reference or adopt.
Key concepts
OLTP vs. DW/BI: two different worlds
Operational systems optimize for transaction throughput (inserts, updates, deletes); data warehouses optimize for query performance and usability (selects, aggregations, historical comparisons). The chapter provides a crisp comparison table: OLTP uses ER modeling and 3NF; DW/BI uses dimensional modeling and star schemas. These are not competing approaches — they serve fundamentally different purposes.
The case against ER modeling for analytics
3NF is efficient for writes but hostile to reads. Normalization proliferates tables and join paths, making queries slow, hard to write correctly, and nearly impossible for business users to verify. History tracking compounds the problem by turning simple 1:M relationships into M:M relationships, adding even more physical tables. Corr argues that ER diagrams are visually overwhelming at warehouse scale, which blocks stakeholder collaboration.
The case for dimensional modeling
Dimensional models define business processes as measurable events (facts) surrounded by descriptive context (dimensions). Star schemas — a central fact table joined to dimension tables — minimize joins, maximize query performance, and are intuitive enough for business users to read. The key structural insight: fact tables contain numeric measures of business events; dimensions contain the textual attributes used to filter, group, and describe those measures.
The 7Ws framework
An extension of the journalist’s 5Ws, the 7Ws are the interrogatives that structure every dimensional model:
- Who is involved?
- What did they do?
- When did it happen?
- Where did it take place?
- How many (the measures — facts)?
- Why did it happen?
- How did it happen (in what manner)?
Fact tables represent verbs (business activity); dimensions are nouns, each classifiable as one of the 7Ws. Star schemas typically contain 8-20 dimensions because each W can appear multiple times (e.g., an order fulfillment event might have three who dimensions: customer, employee, carrier).
Data warehouse analysis approaches
Corr identifies two traditional analysis methods, both with significant limitations for proactive DW design:
- Data-driven analysis (supply-side): profile source systems to discover what data exists. Fails when source systems are still under development or use opaque packaged schemas.
- Reporting-driven analysis (demand-side): interview stakeholders about desired reports. Fails because BI requirements are accretive — users cannot articulate future needs until they see current data.
The chapter frames this as a “chicken or the egg” problem that traditional methods cannot solve.
Agile data warehouse design
Corr contrasts waterfall DW development (Big Design Up Front / BDUF) with agile delivery. The minimum deliverable unit in agile DW is a single star schema — queryable tables, ETL to populate them, and a BI tool to access them. The book uses JEDUF (Just Enough Design Up Front) for cross-iteration planning and JIT (Just-In-Time) detail modeling within each sprint.
BEAM introduction
BEAM (Business Event Analysis & Modeling) is the book’s core contribution. It combines:
- Data stories: narratives that use the 7Ws to describe what a business process measures. Stakeholders tell data stories; modelers capture the dimensional structure from those stories.
- BEAM tables: tabular example-data formats that look like simple reports. They replace abstract ER diagrams with something stakeholders can read and validate. Built column-by-column on whiteboards from stakeholder responses to 7W questions.
- Modelstorming: fast, collaborative modeling sessions (hours, not days) that replace lengthy interview cycles.
The asterisk in BEAM represents star schemas — the dimensional deliverable the method produces.
Agile dimensional modeling benefits
Corr lists six benefits of applying agile methods to dimensional modeling: avoids analysis paralysis by focusing on business processes rather than reports; produces flexible report-neutral designs; enables proactive influence on operational system development; supports accretive requirements through evolutionary iteration; teaches stakeholders to think dimensionally; and creates stakeholder ownership of the resulting data models.
Mapping against Ray Data Co
Testing matrix. The Scope x Basis framework assumes dimensional structure. Corr’s chapter clarifies why: fact tables have grain (a specific level of detail), and dimensions have hierarchies. Both are testable properties. Scope (row-level, column-level, cross-table) maps naturally to the fact/dimension/grain structure Corr describes. Basis (absolute, relative, temporal) maps to the kinds of assertions you can make about measures vs. attributes. The testing matrix is more powerful when teams understand the dimensional model underneath.
MG harness review. The harness currently lacks a formal approach to test-driven ETL development. Corr explicitly names targeted data profiling and ETL TDD as prerequisites for agile DW work — and warns that modeling what cannot be tested is wasted effort. This gives us a specific recommendation to bring back to MG: adopt BEAM-style profiling as a pre-sprint activity that feeds directly into dbt test definitions.
Consulting positioning. Corr’s collaborative modeling philosophy aligns with RDCO’s engagement style. The modelstorming technique — short, stakeholder-led sessions that produce immediately usable artifacts — is how we already run discovery. We can formalize this by adopting BEAM table notation as a deliverable format, giving clients something more structured than meeting notes but more accessible than ER diagrams.
Kimball alignment. This chapter is a companion to the Kimball Toolkit reference already in the vault. Where Kimball defines the canonical patterns (conformed dimensions, bus architecture, SCD types), Corr provides the process for discovering and documenting those patterns collaboratively. The two books cover design vs. discovery.
Related
- 01-projects/data-quality-framework/testing-matrix-template — Scope x Basis testing framework; dimensional structure is the assumed substrate
- 06-reference/2026-04-13-mg-harness-review-cc-wrapped — MG harness TDD gap; Corr’s targeted profiling validates the finding
- 06-reference/2026-04-03-the-data-warehouse-toolkit — Kimball’s canonical dimensional design patterns; Corr provides the agile discovery process
- 06-reference/2026-03-30-founder-data-quality-framework — founder’s original data quality framework article; dimensional modeling is the structural prerequisite
- 06-reference/2026-04-07-dbt-semantic-layer-vs-text-to-sql-benchmark — semantic layer enforces dimensional definitions; BEAM tables are the upstream artifact
- 06-reference/2026-04-04-eric-weber-data-team-roi-ai-first — Weber’s ROI metrics for data teams; BEAM’s collaborative approach generates the stakeholder buy-in Weber says is required