Counting and Aggregation: Controlling the Grain (Ch 9)
Chapter 9 argues that aggregation is not just a downstream calculation but a structural constraint baked into the model itself. A model that supports clean aggregation has well-defined grains, stable boundaries, and predictable mathematical behavior.
Core Framework
Four prerequisites before a count means anything: (1) know what you are counting (identity), (2) establish existence and cardinality, (3) determine discreteness vs continuity, (4) account for context and scope. The “active users” example shows how the same question yields 3x variance depending on definition.
Six Structural Principles
Reis introduces a stress-test checklist for safe aggregation: Grain alignment, Disjointness (mutually exclusive groups), Additivity (additive / semi-additive / non-additive measures), Decomposability (associativity and commutativity for distributed compute), Closure (output stays in the original domain), and Boundedness (time and dimensional scope).
The average-of-averages trap is a decomposability violation — AVERAGE is not associative, so pre-computed averages cannot be safely rolled up. Fix: track SUM and COUNT separately, divide last.
Universality Beyond Tables
Principles apply across data forms: text (TF-IDF, topic models), images (pooling layers as aggregation), graphs (message-passing neighborhoods), and event streams (windowing as boundedness). Sliding windows create overlapping groups — same double-counting risk as many-to-many product categories.
RDCO Relevance
Directly maps to dbt consulting: grain audits, metric-layer additivity checks, and pre-aggregation trade-offs are daily SDG pipeline concerns. Cross-ref with SDG pipeline articles on incremental models and metric definitions.