ETL Is Dead
Ananth Packkildurai argues that ETL remains operationally active but is dead as the defining identity of data engineering. The shift from human-operated to AI-agent-operated data systems restructures what competencies matter.
Mental model
The core analogy: Amazon’s physical warehouse robotization. When Kiva robots replaced human pickers, Amazon didn’t optimize the existing layout — they rebuilt entirely. Wide aisles, logical grouping, signage — all designed for human cognition — became unnecessary. The data warehouse is at the same inflection point.
Current architectures optimize for humans:
- Star schemas enable visual navigation of relationships
- Data catalogs function as digital signage for browsing
- Medallion architecture (Bronze → Silver → Gold) assumes human inspection at each stage
- Naming conventions act as wayfinding
When agents become the primary operators, these affordances become overhead. Context erodes through each handoff “like a game of telephone.”
The ECL framework (Extract–Contextualize–Link)
Packkildurai proposes ECL as the successor mental model:
- Extract — data movement remains, but AI handles more mechanical construction
- Contextualize — the new center of gravity. A dedicated pipeline maintaining a “Context Store” with context objects (long-lived semantic definitions with validation chains and confidence levels) and decision objects (audit trails of what agents inferred)
- Link — semantic relationships across entities (customer in CRM ↔ user in product ↔ session in support), not just table joins
This is the convergent evolution pattern again — the same pressure toward semantic precision keeps surfacing. Business glossaries, semantic layers, data catalogs, and knowledge graphs all promised to capture institutional meaning but failed because the economics weren’t aligned: humans bore maintenance costs while benefits were diffuse.
Why this time is different
The economic inversion: when AI agents are the consumer, bad context produces systematic hallucination at enterprise scale — not mere human frustration. The cost of not maintaining context now exceeds the cost of maintaining it. That feedback loop is structurally new and explains why similar efforts failed for two decades.
Dimensional modeling survives (sort of)
Kimball’s first two steps — identify the business process, select the grain — represent permanent “context architecture.” The subsequent steps (star schema, dimension tables, fact tables) were rendering choices optimized for human analysts querying relational databases. The thinking survives; the format may not. See The Data Warehouse Toolkit for the original framework.
Historical pendulum
| Era | Tradeoff |
|---|---|
| Relational | Maximum semantic precision, operational rigidity |
| Hadoop | Operational flexibility, semantic collapse (data swamps) |
| Lakehouse | Compromise with incomplete semantic layers |
| ECL | Decouples semantic precision from physical rigidity |
This maps directly to the DEDP ETL tool evolution — every generation tries to resolve the same tension.
Connections
- 06-reference/concepts/products-for-agents — if agents are the new operators, every data product becomes a product-for-agents. ECL’s Context Store is exactly this: data structured for agent consumption rather than human browsing.
- 06-reference/concepts/analytics-as-craft — the professional identity crisis Packkildurai describes mirrors the analytics-as-craft tension. When the mechanical work gets automated, the craft becomes context architecture and semantic negotiation.
- 06-reference/2026-04-04-claude-code-not-replacing-data-engineers — complementary thesis. Claude Code augments rather than replaces, but what it augments shifts from pipeline execution to semantic reliability.
- 06-reference/2026-04-03-downfall-of-data-engineer — the professional pressure Packkildurai names has been building for a while.
- 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization — the semantic layer is the part of the stack ECL puts at center stage.
- 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp — the architectural lineage ECL descends from.
- 06-reference/2026-04-04-context-graphs-trillion-dollar-opportunity — context graphs as infrastructure for the Contextualize step.
- 06-reference/2026-04-04-dedp-data-contracts-schema-evolution — data contracts are one mechanism for the “semantic contracts between teams” Packkildurai calls for.
- 01-projects/phdata/index — consulting relevance: clients asking “what should our data team do next?” are hitting exactly this identity shift. ECL framing could shape engagement positioning.
Open questions
- Is ECL actually actionable? The framework names the shift well, but the Context Store is hand-wavy. What does a production implementation look like? Is it a knowledge graph? A vector store with structured metadata? Both?
- Who owns Contextualize? If the value migrates to semantic reliability, does that create a new role (context engineer?) or does it stay within data engineering? The organizational design question matters more than the technical one.
- How fast does the feedback loop actually close? Packkildurai assumes agents-as-consumers create immediate feedback on bad context. But most orgs are still in “agents as copilots” mode, not “agents as autonomous operators.” The forcing function may be years out for most enterprises.
- Does this change phData’s positioning? If clients are shifting from pipeline delivery to context architecture, the consulting value prop needs to shift with it. Worth a dedicated note.