Beyond Rows and Columns: The Five Forms of Data (Ch 4)

First chapter of Part 2 (Building Blocks). Catalogs five forms of data that modern modelers must handle:

Structured -- tables, rows/columns, SQL, relational databases. Still the home turf but not the whole picture. Same data needs different models for OLTP, OLAP, and ML features.
Semi-structured -- JSON, XML, NoSQL. Flexibility at the cost of consistency. Hybrid "shred stable fields, keep rest as raw" approach recommended.
Unstructured -- text, images, audio, video. Model through metadata + derived features + reference pattern (content in object storage, reference in relational model, features in vector DB).
ML/AI artifacts -- trained models, embeddings, feature vectors, agent traces, synthetic data. Provenance tracking is a modeling challenge.
Metadata -- business, operational, technical. The "connective tissue." Modern table formats (Iceberg, Delta, Hudi) are essentially metadata-as-model.

Distinguishes form vs. format (relational table vs. Parquet file) and modeling intent (greenfield) vs. modeling exhaust (brownfield/reverse engineering).

RDCO relevance

Expands the scope of what we should be thinking about in dbt projects. Most of our work is structured + semi-structured, but the metadata-as-first-class-citizen argument strengthens our case for investing in dbt documentation, descriptions, and tests as modeling artifacts, not afterthoughts.