Chapter 9: Why and How -- Design Patterns -- Corr & Stagnitto

The final chapter of ADWD tackles the two remaining Ws: why (causal dimensions) and how (miscellaneous/degenerate dimensions). Together they complete the 7W dimensional description of a business event. The chapter is pattern-heavy: each section follows a Problem/Requirement, Solution, Consequences structure.

Why this is in the vault

Causal dimensions and multi-valued relationships are the hardest parts of dimensional modeling to get right -- and the most likely to bite us when we start building star schemas for client analytics. Bridge tables with weighting factors, pivoted dimensions for combination constraints, and audit dimensions for lineage are patterns we will reach for repeatedly. This chapter is the go-to reference for those designs.

Key concepts

Why dimensions (causal dimensions)

Causal dimensions capture the reasons business events happen. They split into two axes:

Direct vs. indirect -- Direct causal factors (e.g., a promo code on a transaction) are recorded with certainty. Indirect factors (e.g., weather, a sporting event happening nearby) are speculative -- stakeholders believe they had influence but cannot prove it.
Internal vs. external -- Internal factors are under the organization's control (price discounts, ad campaigns). External factors are not (weather, competitor actions, public events).

A PROMOTION dimension is the classic internal why -- mixing direct attributes like DISCOUNT TYPE with indirect ones like advertising CHANNEL. The default row ("No Promotion", key 0) will dominate most fact tables. Internal causes that are significant events in their own right (seminars, campaigns) may justify their own fact tables and conformed causal dimensions spanning multiple star schemas.

Unstructured why dimensions

Free-text reasons attached to transactions (sales comments, claim notes) should be extracted from fact tables into COMMENT dimensions. Even a bare text column is useful for keyword searching. Over time, ETL can enrich these with coded tags to improve filterability -- a progressive refinement approach the authors explicitly endorse.

External why dimensions

External factors like weather can be added as dimensions to any fact table whose granularity matches the external feed on time and location. The key insight: these dimensions do not alter fact granularity, so they can be added later once a reliable data source is found. The WEATHER dimension example stores general conditions rather than raw measurements -- it is designed for filtering and grouping, not for weather analysis.

Multi-valued dimensions

When more than one cause applies to a single fact (e.g., multiple sporting events overlapping a sales day), the causal attribute becomes multi-valued. Naive grouping by the multi-valued attribute causes over-counting -- the total of the parts exceeds the whole.

Weighting factors

The solution: expand the dimension into per-date (or per-fact) rows and assign a weighting factor that sums to 1.0 for each group. Revenue can then be weighted at query time: SUM(Revenue * Weighting_Factor). The authors note that the "correct" split is inherently political -- different stakeholders will disagree about allocation -- but the weighted grand total is always defensible. Impact reports can show both raw and weighted figures side by side.

Bridge tables (multi-valued bridge pattern)

When a direct causal attribute is genuinely multi-valued at the atomic grain (e.g., multiple diagnoses per medical claim), a bridge table resolves the M:M without inflating the fact table. The bridge stores each member of the group along with its weighting factor, keyed by a group surrogate. This avoids hard-coded fact allocations and preserves the natural atomic granularity. Queries that filter to a single value bypass the weighting entirely and produce clean, unambiguous results.

Key guard rail: before reaching for a bridge table, verify that the fact granularity is truly atomic. If the multi-valued appearance is caused by a too-coarse grain (e.g., one row per invoice instead of per line item), the fix is to go to the lower grain, not to add a bridge.

Optional bridge pattern

When a dimension is only barely multi-valued (e.g., most sales are individual, a few involve two-person teams), a full bridge table on every query is overkill. Instead, make the dimension multi-level (ML) by adding team-level pseudo-member rows directly in the dimension. An optional bridge table shares the same surrogate keys and is joined only when the weighting split is needed. Most queries take the shortcut join directly to the ML dimension. This pattern uses a MEMBERSHIP TYPE filter to control double-counting.

Pivoted dimensions

When users need to query complex combinations of multi-valued attributes (e.g., "products with options 2 AND 3 but NOT 5"), bridge table row-logic becomes unwieldy. A pivoted dimension (PD) transposes bridge rows into columns (one per option, with Y/N flags), turning row-based set logic into simple column predicates. Bridge and pivoted versions are swappable dimensions -- bridge is GROUP BY-friendly, pivoted is combination-WHERE-friendly. Build the bridge first; derive the pivoted version from it.

Caveat: pivoted dimensions are only practical when the value population is small and stable. With 20 columns there are already over a million possible combinations; 155,000 diagnosis codes would be unworkable.

How dimensions

How dimensions capture residual descriptive details not covered by the other Ws. The most common form is the degenerate dimension (DD) -- a transaction ID left in the fact table with no corresponding dimension table. Degenerate IDs provide source-system traceability and enable distinct counting in multi-valued scenarios.

When a fact table accumulates too many degenerates, the authors recommend consolidating them into a physical how dimension (often named after the fact table, e.g., CALL DETAIL for a CALL DETAILS FACT). These non-conformed "miscellaneous dimensions" reduce fact table width and make browsing easier. The authors advise calling them miscellaneous rather than "junk" when talking to stakeholders.

Guidelines for remodeling degenerates: if it will be grouped/filtered, move it to a dimension; if it is a large text comment, move it to a COMMENT dimension; if a Y/N flag will be counted, remodel as an additive fact; if a flag describes a type that splits other facts, split those facts into separate columns instead.

Range band dimensions

A general-purpose RANGE BAND dimension converts high-cardinality numeric facts (revenue, age, quantity) into discrete, report-friendly bands at query time via a BETWEEN join. Multiple band groups can coexist in the same dimension table. The tradeoff is slower query performance (BETWEEN joins are harder to optimize than equi-joins), mitigated by indexing the banded facts.

Step dimensions

A STEP dimension labels each event with its position in a sequence (step number, total steps, steps until last, last-step flag). This enables positional analysis of event sequences -- finding session-killers, purchase funnels, or events near pivotal cause-and-effect moments. Step dimensions can play multiple roles on the same fact table (e.g., step in session vs. step in purchase). They grow quadratically: n*(n+1)/2 rows for n steps, so set a practical maximum (200 steps covers 99% of cases).

Audit dimensions

Audit dimensions surface ETL metadata (module, version, start/end times, record counts, error counts) as queryable dimensional data. They also carry data quality indicators (out-of-bounds, expected-but-missing, changed-after-load, allocated). This elevates lineage information from being locked in ETL tool repositories to being available in BI reports -- answering the stakeholder questions "how did this data get here?" and "why are the figures unusual?"

Mapping against Ray Data Co

Multi-valued bridge + weighting factors will be directly relevant whenever we model attribution for client marketing analytics. The pattern of keeping facts atomic and pushing allocation into a bridge table is cleaner than the common approach of duplicating fact rows with split amounts.
Pivoted dimensions for combination queries -- if we encounter product-configuration or tag-combination use cases in client data, this is the pattern to reach for. The practical limit on column count is worth remembering before proposing it.
Audit dimensions align with our stance on data quality and lineage. For any warehouse build, adding an AUDIT KEY to fact tables from day one is low cost and high payoff -- it makes "why are the numbers wrong?" investigations self-service rather than requiring ETL log archaeology.
Range band dimensions are a lightweight alternative to BI-tool bucketing. Worth considering for any engagement where the BI layer is thin or the client wants band definitions to be data-driven and changeable without redeploying reports.
Step dimensions are relevant for any clickstream, funnel, or sequential-process modeling we do. The quadratic growth formula is a useful sizing heuristic.
Unstructured why / COMMENT dimensions -- a good reminder that free-text reasons attached to transactions should not live in the fact table, and that progressive enrichment (raw text first, coded tags later) is a valid iterative approach.

[[2026-02-18-practical-data-modeling-mma-ch1-full]] -- MMA Ch1 covers dimensional modeling foundations that this chapter builds on
[[2025-12-31-practical-data-modeling-kimball-vs-inmon]] -- Kimball vs Inmon context for where these patterns sit in the broader methodology debate
[[2026-04-04-dedp-mv-obt-dbt-olap-dwa]] -- Modern data engineering patterns; bridge tables and pivoted dimensions are the Kimball-era ancestors of some OBT and wide-table approaches
[[2026-04-04-sql-patterns-xheblati]] -- SQL patterns reference; range band BETWEEN joins and weighted-sum queries from this chapter connect to practical SQL technique

book adwd ch9 why how design patterns 2026 04 13