Chapter 6: Who and What -- Design Patterns -- Corr & Stagnitto

Part II of the book shifts from the BEAM modelstorming process (Chapters 1-5) into reusable dimensional design patterns. Chapter 6 covers the two most scrutinized dimension families -- who (customers, employees) and what (products, services) -- and the structural problems that emerge when these dimensions grow deep, wide, and volatile.

Why this is in the vault

Customer and product dimensions are the backbone of every analytics warehouse Ray Data Co builds or audits. This chapter provides a pattern language for recurring structural decisions: when to split a dimension, how to handle variable-depth hierarchies without recursive SQL, and how to offer both current-value and historical reporting without duplicating ETL. These patterns recur in client engagements and newsletter content on dimensional modeling.

Key concepts

Mini-dimensions

When a customer dimension is both very large (millions of rows) and contains many historic-value (HV) attributes, Type 2 SCD tracking causes explosive row growth. The mini-dimension pattern extracts the most volatile HV attributes into a separate table with its own surrogate key. The fact table carries both the main CUSTOMER key and the mini-dimension (DEMO) key.

Critical implementation details:

Band continuous values (income, credit score) into discrete ranges to keep the mini-dimension truly mini -- high cardinality attributes defeat the purpose.
Add a current demo key (CV, FK) back to the main customer dimension to support shortcut joins for current-value queries without touching fact tables.
Each mini-dimension adds a foreign key and index to every related fact table, so don't over-fragment.

Sensible snowflaking

Customer dimensions can benefit from selective normalization when large collections of low-cardinality attributes (geodemographic codes, first-purchase date attributes) would otherwise bloat the main table. The authors frame this as an exception to the general denormalization rule -- only justified when the outrigger table has far fewer records than the parent dimension. A FIRST PURCHASE DATE outrigger can be implemented as a role-playing view of the standard CALENDAR dimension.

Swappable dimensions

When a business serves mixed populations (B2B and B2C customers, or hardware vs. software products), a single dimension ends up wide and sparsely populated. Swappable dimensions share the same surrogate key but contain only the rows and columns relevant to a given subtype. They are defined by a defining characteristic (DC) attribute such as CUSTOMER TYPE. "Hot swappable" versions -- those with identical column names -- let BI users switch perspectives without rewriting queries. Useful for row-level security, national language translation, and CV/HV reporting views.

Embedded whos and recursive relationships

Who dimensions frequently reference other whos (e.g., a customer's account manager, a customer's parent company). These "embedded whos" should be remodeled as foreign keys to their own dimensions rather than inline text columns. Parent-company relationships introduce recursive (self-referencing) foreign keys. When the hierarchy has a fixed, shallow depth, flattening into named level columns is sufficient. When it does not, a hierarchy map is needed.

Hierarchy maps

The central structural pattern of the chapter. A hierarchy map is a separate table that pre-resolves every ancestor-descendant pair in a variable-depth recursive relationship. For a COMPANY STRUCTURE hierarchy map, each row pairs a PARENT KEY with a SUBSIDIARY KEY plus metadata: company level, sequence number, lowest-subsidiary flag, and highest-parent flag.

Key properties and tradeoffs:

Row explosion is modest -- the authors provide a quick estimate formula: members x (max_levels - 1).
The SUBSIDIARY KEY must be modeled as HV even in a CV hierarchy map so it can join to all historical fact rows tied to Type 2 SCD surrogate keys.
Hierarchy maps are typically easier to drop and rebuild than incrementally maintain, because node moves cascade through sequence numbers.
Querying multiple parent companies requires constraining on HIGHEST PARENT = 'Y' or fetching a distinct set of subsidiary keys before summing, to avoid double-counting revenue.
Tracking full hierarchy history (parent, hierarchy-shape, and child changes) is a design decision with escalating complexity. Child history tracking is the minimum default.

The recursive key ripple effect

When a parent's HV attribute changes in a dimension with an HV recursive key, the new surrogate key must propagate to every descendant's PARENT KEY, and each of those descendants gets a new row too. A single micro-level change to a top-level parent can generate rows for every entity in the hierarchy. The authors warn this is manageable only when hierarchies are small and stable, and advise using effective dating on the hierarchy map instead for large or volatile structures.

Employee dimensions and hybrid SCD views

Employee dimensions are smaller than customer dimensions and generally tolerate Type 2 SCD processing. The challenge is supporting both "as was" (HV) and "as is" (CV) reporting simultaneously. The hybrid SCD view pattern creates a CV swappable dimension by self-joining the HV dimension to its own current records. Because the CV and HV versions share the same surrogate key, they are hot-swappable -- BI users change the temporal perspective of any query by switching which dimension version they join. The authors advise defaulting to HV ETL processing even when stakeholders only ask for CV, because a CV view can be added later for free, while retrofitting HV tracking is expensive.

Previous value attributes (CV/PV)

A lightweight alternative to full Type 2 history: keep a PREVIOUS TERRITORY column alongside the current TERRITORY column. Implemented as Type 3 SCD columns. Useful for one-off comparisons after macro-level changes (branch relocations, territory realignments) but doesn't scale beyond one or two versions.

Multi-valued hierarchy maps (HR hierarchies)

HR reporting structures are the hardest case because employees can report to multiple managers (solid-line and dotted-line). This requires a multi-valued hierarchy map (MV, HM) with additional attributes: ROLE TYPE (permanent vs. temporary) and WEIGHTING FACTOR (FTE allocation). Revenue or expense rollups use Sum(Revenue x Weighting_Factor). Distant relationship weighting factors are calculated by multiplying the intermediate direct factors. For full history tracking, the hierarchy map must use effective dating rather than HV recursive keys because the ripple effect across an entire interconnected employee population would be catastrophic.

Product and service dimensions

Product dimensions are complex not because of size but because of the variety of specialist attributes across product types. Swappable subset dimensions (per product type) help here, just as with mixed customer populations. Product hierarchies are typically fixed-depth but often ragged -- the authors recommend balancing ragged hierarchies by filling missing levels (top-down, bottom-up, or combined balancing) rather than building hierarchy maps.

Multi-level dimensions

When different business processes associate facts with different levels of a product hierarchy (e.g., web page visits at category level vs. sales at SKU level), a multi-level dimension accommodates all levels using special surrogate keys for the higher-level entries. This lets a single PRODUCT dimension serve fact tables at any granularity.

Reverse hierarchy maps (bill of materials)

Product composition (bill-of-materials) is another variable-depth hierarchy, but navigated in the opposite direction -- from parent product down into components. The same hierarchy map technique applies, but the analytical questions are reversed: instead of rolling up child facts to parents, you decompose parent assemblies into constituent parts.

Mapping against Ray Data Co

Client warehouse audits: The mini-dimension and sensible-snowflaking patterns provide a concrete checklist for evaluating whether a client's monster customer dimension should be refactored. The "is it really mini?" question is a common audit finding.
Hierarchy map pattern: Directly applicable any time a client needs corporate ownership rollups, franchise reporting, or org-chart analytics. The drop-and-rebuild advice is practical for ETL pipeline design.
Hybrid SCD view: The advice to default to HV processing and deliver CV as views is a strong consulting recommendation -- it avoids the expensive retrofit scenario. Good newsletter material for the "build for the future you can see" angle.
Swappable dimensions: Relevant whenever a client serves mixed B2B/B2C populations or sells heterogeneous products. The hot-swappable property (identical column names) is an elegant usability feature worth highlighting in client deliverables.
Multi-valued hierarchy maps: The most complex pattern in the chapter. Relevant for any client with matrix-management org structures. The weighting-factor math is non-obvious and worth walking through in an audit finding.

[[2025-12-31-practical-data-modeling-kimball-vs-inmon]] -- Kimball vs. Inmon framing; Ch 6 patterns sit squarely in the Kimball tradition
[[2026-02-18-practical-data-modeling-mma-ch1-full]] -- MMA modeling foundations that underpin these patterns
[[2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp]] -- warehouse architecture context where these patterns are deployed
[[2026-04-04-dedp-history-state-de]] -- SCD history/state tracking complements the HV/CV/PV discussion here
[[2026-04-04-sql-patterns-xheblati]] -- SQL execution patterns relevant to hierarchy map queries

book adwd ch6 who what design patterns 2026 04 13