06-reference

book adwd ch6 who what design patterns 2026 04 13

Sun Apr 12 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Agile Data Warehouse Design (book) ·by Lawrence Corr, Jim Stagnitto

Chapter 6: Who and What — Design Patterns — Corr & Stagnitto

Part II of the book shifts from the BEAM modelstorming process (Chapters 1-5) into reusable dimensional design patterns. Chapter 6 covers the two most scrutinized dimension families — who (customers, employees) and what (products, services) — and the structural problems that emerge when these dimensions grow deep, wide, and volatile.

Why this is in the vault

Customer and product dimensions are the backbone of every analytics warehouse Ray Data Co builds or audits. This chapter provides a pattern language for recurring structural decisions: when to split a dimension, how to handle variable-depth hierarchies without recursive SQL, and how to offer both current-value and historical reporting without duplicating ETL. These patterns recur in client engagements and newsletter content on dimensional modeling.

Key concepts

Mini-dimensions

When a customer dimension is both very large (millions of rows) and contains many historic-value (HV) attributes, Type 2 SCD tracking causes explosive row growth. The mini-dimension pattern extracts the most volatile HV attributes into a separate table with its own surrogate key. The fact table carries both the main CUSTOMER key and the mini-dimension (DEMO) key.

Critical implementation details:

Sensible snowflaking

Customer dimensions can benefit from selective normalization when large collections of low-cardinality attributes (geodemographic codes, first-purchase date attributes) would otherwise bloat the main table. The authors frame this as an exception to the general denormalization rule — only justified when the outrigger table has far fewer records than the parent dimension. A FIRST PURCHASE DATE outrigger can be implemented as a role-playing view of the standard CALENDAR dimension.

Swappable dimensions

When a business serves mixed populations (B2B and B2C customers, or hardware vs. software products), a single dimension ends up wide and sparsely populated. Swappable dimensions share the same surrogate key but contain only the rows and columns relevant to a given subtype. They are defined by a defining characteristic (DC) attribute such as CUSTOMER TYPE. “Hot swappable” versions — those with identical column names — let BI users switch perspectives without rewriting queries. Useful for row-level security, national language translation, and CV/HV reporting views.

Embedded whos and recursive relationships

Who dimensions frequently reference other whos (e.g., a customer’s account manager, a customer’s parent company). These “embedded whos” should be remodeled as foreign keys to their own dimensions rather than inline text columns. Parent-company relationships introduce recursive (self-referencing) foreign keys. When the hierarchy has a fixed, shallow depth, flattening into named level columns is sufficient. When it does not, a hierarchy map is needed.

Hierarchy maps

The central structural pattern of the chapter. A hierarchy map is a separate table that pre-resolves every ancestor-descendant pair in a variable-depth recursive relationship. For a COMPANY STRUCTURE hierarchy map, each row pairs a PARENT KEY with a SUBSIDIARY KEY plus metadata: company level, sequence number, lowest-subsidiary flag, and highest-parent flag.

Key properties and tradeoffs:

The recursive key ripple effect

When a parent’s HV attribute changes in a dimension with an HV recursive key, the new surrogate key must propagate to every descendant’s PARENT KEY, and each of those descendants gets a new row too. A single micro-level change to a top-level parent can generate rows for every entity in the hierarchy. The authors warn this is manageable only when hierarchies are small and stable, and advise using effective dating on the hierarchy map instead for large or volatile structures.

Employee dimensions and hybrid SCD views

Employee dimensions are smaller than customer dimensions and generally tolerate Type 2 SCD processing. The challenge is supporting both “as was” (HV) and “as is” (CV) reporting simultaneously. The hybrid SCD view pattern creates a CV swappable dimension by self-joining the HV dimension to its own current records. Because the CV and HV versions share the same surrogate key, they are hot-swappable — BI users change the temporal perspective of any query by switching which dimension version they join. The authors advise defaulting to HV ETL processing even when stakeholders only ask for CV, because a CV view can be added later for free, while retrofitting HV tracking is expensive.

Previous value attributes (CV/PV)

A lightweight alternative to full Type 2 history: keep a PREVIOUS TERRITORY column alongside the current TERRITORY column. Implemented as Type 3 SCD columns. Useful for one-off comparisons after macro-level changes (branch relocations, territory realignments) but doesn’t scale beyond one or two versions.

Multi-valued hierarchy maps (HR hierarchies)

HR reporting structures are the hardest case because employees can report to multiple managers (solid-line and dotted-line). This requires a multi-valued hierarchy map (MV, HM) with additional attributes: ROLE TYPE (permanent vs. temporary) and WEIGHTING FACTOR (FTE allocation). Revenue or expense rollups use Sum(Revenue x Weighting_Factor). Distant relationship weighting factors are calculated by multiplying the intermediate direct factors. For full history tracking, the hierarchy map must use effective dating rather than HV recursive keys because the ripple effect across an entire interconnected employee population would be catastrophic.

Product and service dimensions

Product dimensions are complex not because of size but because of the variety of specialist attributes across product types. Swappable subset dimensions (per product type) help here, just as with mixed customer populations. Product hierarchies are typically fixed-depth but often ragged — the authors recommend balancing ragged hierarchies by filling missing levels (top-down, bottom-up, or combined balancing) rather than building hierarchy maps.

Multi-level dimensions

When different business processes associate facts with different levels of a product hierarchy (e.g., web page visits at category level vs. sales at SKU level), a multi-level dimension accommodates all levels using special surrogate keys for the higher-level entries. This lets a single PRODUCT dimension serve fact tables at any granularity.

Reverse hierarchy maps (bill of materials)

Product composition (bill-of-materials) is another variable-depth hierarchy, but navigated in the opposite direction — from parent product down into components. The same hierarchy map technique applies, but the analytical questions are reversed: instead of rolling up child facts to parents, you decompose parent assemblies into constituent parts.

Mapping against Ray Data Co