06-reference

dedp dwh mdm datalake reverse etl cdp

Fri Apr 03 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·book-chapter ·source: https://www.dedp.online/part-2/4-ce/dwh-mdm-data-lake-reverse-etl-cdp.html ·by DEDP / Simon Späti

DEDP 4.4 — DWH, MDM, Data Lake, Reverse ETL, CDP

Five technologies that evolved independently but converge on the same underlying need: centralize data, apply business logic, make it accessible. The chapter frames them as convergent evolution — different species solving the same survival problem.

The Five Systems

Data Warehouse (1980s+): The original “single source of truth.” Integrates sources, applies business logic, separates analytical from operational workloads. ETL pattern — transform before loading. The lineage runs from Devlin & Murphy (IBM, late 1980s) through dimensional modeling, Data Vault, and into cloud-native (BigQuery, Redshift, dbt). See 06-reference/2026-04-03-the-data-warehouse-toolkit for the Kimball foundation.

Master Data Management (1990s+): Governance-first approach. Centralizes master data (customers, products, locations) with fuzzy matching, stewardship, and cross-system consistency. The oldest pattern conceptually — Hollerith punch cards (1890s), lateral files (1898), Social Security master files (1936). Formal MDM systems emerged in the 1990s for regulatory compliance.

Data Lake (2000s+): Store everything raw, transform later (ELT). Built on Hadoop, then cloud object storage (S3, Azure Blob, GCS). Modern stack = columnar formats (Parquet, Avro, ORC) + table formats (Delta Lake, Iceberg, Hudi) that add ACID transactions and schema enforcement. The Data Lakehouse (2021) merges lake flexibility with warehouse governance.

Reverse ETL (2021+): Push processed data from the warehouse back into operational SaaS tools (Salesforce, Marketo, Zendesk). Described as “a modern approach to MDM” — same goal (consistent data in operational systems), different mechanism. Tools like Hightouch are already pivoting toward CDP/data activation positioning.

Customer Data Platform (2010s+): CRM on steroids — collects customer data from all channels, cleanses, segments, activates. Term coined by David Raab in 2013. Distinguished from CDI (Customer Data Infrastructure), which moves data but does not store or analyze. CDPs emphasize real-time processing and regulatory compliance (GDPR, CCPA).

Four Shared Patterns

The chapter identifies four patterns that all five technologies implement:

  1. Data Sharing — make data universally accessible across platforms and users
  2. Reusability — transform once, use everywhere; reduce duplication of effort
  3. Business Transformation — raw data becomes actionable insight through applied logic
  4. In-Memory / Ad-Hoc Querying — flexible analysis without pre-aggregation

These are the same needs every 01-projects/phdata/index engagement surfaces: clients want a single source of truth (DWH), consistent master data (MDM), flexibility for new use cases (lake), operational activation (reverse ETL), and customer-level analytics (CDP). The question is always which pattern to lead with.

What Matters for Consulting

The convergence framing is the real value. When a client says “we need a CDP,” the underlying need is usually data sharing + business transformation on customer entities. When they say “we need reverse ETL,” the need is operational data activation. Naming the pattern underneath the technology prevents over-buying.

Reverse ETL as modern MDM is a useful reframe for enterprise clients who already have MDM programs. It de-escalates the “replace our MDM” conversation and positions reverse ETL as a complementary activation layer.

Data Lakehouse as convergence point. The chapter confirms what we see in the field: lakes are absorbing warehouse capabilities (via table formats + dbt), and warehouses are absorbing lake capabilities (via external tables + unstructured data support). The architecture decision is less “lake vs warehouse” and more “where does governance live.”

CDP market dynamics. $20.5B projected by 2027. Relevant for 01-projects/data-marketplace/index — customer data products are the highest-value segment. The distinction between CDP and CDI maps to the 06-reference/2026-04-03-data-products-taxonomy split between data products (analytical value) and data infrastructure (plumbing).

What’s Academic

The historical timelines (Hollerith punch cards, lateral files) are interesting context but not actionable. The MDM history section is the weakest — it jumps from 1936 to the 1990s without explaining why formal MDM emerged when it did.

Key Takeaway

Every generation reinvents “centralize, transform, serve.” The technology changes; the pattern does not. Understanding the pattern makes you technology-agnostic in a way that 06-reference/concepts/systems-over-goals demands — you design for the outcome, not the tool.