Grain: Getting the Level Right (Ch 8)
Chapter 8 on grain — the fundamental level of detail captured in a dataset. Core question: what does one row represent?
Key rules:
- Grain works in one direction: You can always aggregate up from fine-grained data; you can never deterministically disaggregate from coarse-grained data
- Define grain early: Prevents costly redesigns
- When in doubt, go finer: Storage cost of extra detail is almost always cheaper than discovering you need detail you threw away
Common failure modes:
- Incompatible grains: Joining daily aggregates with transaction-level data. The “mixed-grain trap” where summary and detail rows coexist in one table, causing double-counting
- Fan-out: Joining tables with mismatched grains silently multiplies rows. A customer with 2 orders appears twice — count customers and you get wrong numbers
Grain across five camps: Relational (atomic row-level), Analytical (aggregated to answerable level), Application (event or document), ML/AI (feature grain per entity per snapshot date), Knowledge (the triple as atomic unit).
Introduces “grain routing” for AI agents: LLMs need a semantic layer that tags each dataset with its grain so the agent knows where to route queries before touching data.
The Four Questions for grain decisions and a Grain Audit Checklist for production deployment.
RDCO relevance
Grain is the #1 source of bugs in dbt projects. The fan-out section and audit checklist are directly deployable in client code reviews. The “grain routing” concept for AI agents is forward-looking and connects to semantic layer work.