Patterns of Data Engineering: Timeless Practices from Convergent Evolution
Summary
A freely available, continuously evolving online book exploring recurring design patterns in data engineering through the lens of convergent evolution — where different systems independently arrive at similar solutions. Organized into three parts: foundational concepts and design pattern theory, mastering patterns through architecture and modeling, and real-world implementation. Covers materialized views, data validation, OLAP cubes, dimensional modeling, ETL/ELT/Reverse ETL, open data platforms, and governance strategies. Targets intermediate-level data professionals.
Why This Was Bookmarked
“a freely available deep dive book on data engineering. Excellent material to feed into building up our knowledge base.”
This is a knowledge base source, not just a reference. The convergent evolution framing — seemingly new technologies repackage established concepts — maps to how we think about 06-reference/concepts/compounding-knowledge. Feed this into our vault as foundational data engineering reference material.
Key Ideas
- Convergent evolution as organizing principle: different data systems independently arrive at the same patterns, so learn the pattern, not the tool
- Timeless over trendy: materialized views, dimensional modeling, and validation patterns persist across technology generations
- Three-part structure: theory of patterns, architectural/modeling applications, real-world implementation
- Intermediate audience: assumes SQL and programming foundations, targets practitioners who need the “why” behind patterns
- Living book: continuously updated, which means it tracks the evolving landscape
Connections
Directly feeds 06-reference/2026-04-01-karpathy-llm-knowledge-bases — this is exactly the kind of structured domain knowledge that makes AI assistants more effective. If we ingest the key patterns, our agents become better at data engineering work.
Relevant to 01-projects/phdata/index consulting — the convergent evolution framing helps explain to clients why the same patterns keep appearing across their stack.
The pattern-over-tool philosophy aligns with 06-reference/concepts/analytics-as-craft — mastering the craft means mastering the patterns, not the tools.
Open Questions
- Should we systematically ingest chapters into the vault as individual reference docs?
- Which patterns are most relevant to our current client work?
- Can we build a skill that references these patterns when doing dbt/data modeling work?