DEDP 1.2 — The History and State of Data Engineering

Comprehensive timeline from BI origins to the 2025 landscape. The densest historical reference in the book.

Historical Timeline

Edgar F. Codd proposed SQL as a standard database language to abstract storage complexities
Bill Inmon formally defined “data warehouse,” establishing foundational BI principles
SQL evolved into variants (T-SQL, PL/SQL) with procedural capabilities

Ralph Kimball’s The Data Warehouse Toolkit — dimensional modeling approaches still used today
Inmon vs Kimball debate begins (and never really ends)

Massively parallel processing (MPP) databases emerged
Google published GFS (2003) and MapReduce (2004) — two papers that changed everything
Yahoo released Hadoop (2006)

AWS, GCP, Azure transformed infrastructure economics
Key tools: Amazon Redshift, Snowflake, Apache Airflow (2014), Superset (2015), dbt (2016)

Maxime Beauchemin published “The Rise of the Data Engineer” — formally defining the discipline
Transition from “big data engineer” to “data engineer”
Functional Data Engineering paradigm established

Beneath technological cycles lies an enduring need: “fresh, organized, and clean data.”

The book focuses on convergent design patterns recurring across eras rather than chasing technological trends.

Technology waves, constant problems — every ~10 years the stack turns over, but the core needs (clean, fresh, organized, accessible data) never change. This is the strongest argument for pattern-based thinking.
Paper → open-source → commercial → commodity — the lifecycle of data infrastructure (GFS paper → Hadoop → Snowflake → commodity cloud warehousing). Recognizing where a technology sits in this cycle informs build-vs-buy.
Small data stack as counter-trend — after years of “big data” hype, 2024-2025 sees pragmatic return to simpler, cheaper approaches. DuckDB, Polars, single-node processing.