Attributes: Describing the Entity (Ch 6)
Chapter 6 on how attributes manifest across data forms. Attributes in structured data are columns; in semi-structured they’re key-value pairs; in unstructured they must be actively manufactured through extraction pipelines (metadata, structural, and semantic attributes via NLP).
Key practical guidance:
- Naming: Clarity (OrderDate not Date), consistency (pick one convention and stick to it), avoid reserved words
- Nulls: Scalar math with null yields null; aggregate functions silently skip nulls. Use explicit defaults like “missing” or “unknown” to reduce ambiguity
- Attribute bloat / cross-contamination: Gradual addition of unrelated attributes to a table (the Orders table with customer_shoe_size). If you can’t explain what an entity represents in one breath without “and,” you have bloat
- ML/AI artifacts as entities: A trained model has its own attributes (hyperparameters, weights, confidence scores, feature vectors) distinct from the data it analyzes
- Metadata as first-class attribute: Data catalogs and observability platforms treat datasets themselves as entities with attributes (schema evolution, usage frequency, data quality scores)
RDCO relevance
The naming conventions and null-handling guidance is immediately useful in dbt style guides for clients. The attribute bloat anti-pattern maps directly to the “One Big Table gone wrong” situation we see regularly.