Grain: Getting the Level Right (Ch 8)

Chapter 8 on grain -- the fundamental level of detail captured in a dataset. Core question: what does one row represent?

Key rules:

Grain works in one direction: You can always aggregate up from fine-grained data; you can never deterministically disaggregate from coarse-grained data
Define grain early: Prevents costly redesigns
When in doubt, go finer: Storage cost of extra detail is almost always cheaper than discovering you need detail you threw away

Common failure modes:

Incompatible grains: Joining daily aggregates with transaction-level data. The "mixed-grain trap" where summary and detail rows coexist in one table, causing double-counting
Fan-out: Joining tables with mismatched grains silently multiplies rows. A customer with 2 orders appears twice -- count customers and you get wrong numbers

Grain across five camps: Relational (atomic row-level), Analytical (aggregated to answerable level), Application (event or document), ML/AI (feature grain per entity per snapshot date), Knowledge (the triple as atomic unit).

Introduces "grain routing" for AI agents: LLMs need a semantic layer that tags each dataset with its grain so the agent knows where to route queries before touching data.

The Four Questions for grain decisions and a Grain Audit Checklist for production deployment.

RDCO relevance

Grain is the #1 source of bugs in dbt projects. The fan-out section and audit checklist are directly deployable in client code reviews. The "grain routing" concept for AI agents is forward-looking and connects to semantic layer work.