“A Dispatch from the Jagged Frontier of Analytics Engineering” — Jason Ganz
Why this is in the vault
Direct hands-on field report from inside dbt Labs about exactly where coding agents succeed and fail in real analytics-engineering work — the most concrete “what is actually happening on the ground in April 2026” data point AER has filed in months. This is also the first vault doc that explicitly imports Ethan Mollick’s “jagged frontier” framing into the analytics-engineering domain — which makes it a primary reference for any Sanity Check piece, MAC marketing copy, or phData conversation that needs to talk about where AI helps an AE and where it ships clean SQL with wrong joins.
The core argument
Ganz applies Mollick’s jagged frontier — the observation that LLM capability is unevenly distributed across tasks and the demarcation line is invisible because output looks confident on both sides — to analytics engineering specifically. The shape of the AE frontier is different from the SWE frontier: the peaks and troughs sit in different places.
Where today’s models excel (peaks):
- Standard “fiddly SQL” work that fills the middle of an AE’s week. Ganz’s case study: a colleague (Benoit) asked an agent to build sessionization on dbt MCP server event data — five CTEs of lead/lag/window-function logic. One-shotted. The pattern generalizes: SCD tables, dedup models, rolling-window aggregations, first-cut staging models from a well-described source.
- Adjacent work collapses in the same direction once the agent knows the project shape. Data profiling drops from 3 minutes to 10 seconds. Schema exploration becomes conversational. Cross-referencing across the project happens automatically.
- Behavior shift: AEs start writing models they wouldn’t have bothered with before. The middle of the workday becomes exploratory in a way it hasn’t been in years. (Self-reported 2x–3x speedup, with appropriate caveat about self-reported speedups.)
Where today’s models fail (troughs):
- Tacit knowledge gaps. The case study pivots: Benoit’s pipeline emits user_id one way; the analytics layer hashes user_id with tenant context (multi-tenant deployment, raw IDs aren’t globally unique). Two systems, same field name, different meaning. The agent didn’t know and couldn’t have — the knowledge lives with engineers and account-team people who’ve seen the collisions in production. The SQL was clean, joins were wrong, tests passed. The bug only surfaced when Benoit asked the agent to look at user-ID distribution and a few IDs showed anomalous frequency.
- Unknown-unknowns in source data. When the solution is outside the agent’s frame of reference and there’s no signal in code or docs that something is off.
- Complex operations across a large DAG.
- Cost awareness. Agent ran LLM-powered model functions without asking, burned meaningful tokens. No instinct that those calls were expensive. Small in a session, not small across a team-quarter.
- Tool-setup friction trap. Connecting a new MCP (Datadog) takes 10 min, doing the search manually takes 5 min, you do it manually. Six months later you never built the connective tissue. “I think a lot of us are quietly making that trade right now.”
The load-bearing line: “The SQL being correct is just one part of the data being correct. Agents are extremely good at the first axis and variable at the second, and the gap between ‘the code runs’ and ‘the data is right’ is where data incidents tend to live.”
Where AEs go next — shift left, shift right. Benoit’s framing: the left part of the DAG requires more knowledge of the company’s data systems (the tacit-knowledge work agents can’t do); the right part requires more business understanding (what should ARR be, when is a user “active”). The middle compresses. AEs add value at the edges.
Six-month wishlist: agents that (1) check their own assumptions about data — or surface them to a human — before acting, (2) know which tool calls are expensive and ask first, (3) connect to more of the data stack with lower friction.
Mapping against Ray Data Co
- The single best citation for “jagged frontier in AE” we have. Mollick’s frame has been name-dropped in vault notes (2026-04-19-garry-tan-build-the-car-jepsen-response uses it as the strongest empirical point; 2026-04-15-every-claude-managed-agents-mini-vibe-check discusses Williams’s “AI-pilled vocabulary” piece) but never mapped to the AE workflow. Ganz does the mapping for us, with a real case study and named failure modes. This is a primary source for any Sanity Check piece arguing “AI is uneven in this specific way for AE work.”
- Tacit-knowledge moat thread reinforcement (very high signal). The user_id story is the cleanest AE-specific illustration of Cedric Chin’s tacit-knowledge thesis (2026-04-19-commoncog-the-tacit-knowledge-series, 2026-04-19-commoncog-all-that-is-rare-and-valuable) we’ve seen in the wild. Tacit knowledge × industry-specific data context × opaque-path skill = exactly the rare-and-valuable intersection Chin described. For phData positioning and MAC copy: this is the argument for why a senior AE with tenure inside a company isn’t replaceable by an agent in 2026. Ganz’s framing — “the kind of judgment a thoughtful human analyst would pause on, that the agent right now doesn’t” — is the single best phrase in the vault for marketing this point.
- Half-life-of-a-moat / data-layer thread (high signal). Pairs cleanly with Natkins (2026-04-21-semi-structured-half-life-moat-part-2-dig-faster) and Gupta (2026-04-13-jaya-gupta-ai-lock-in-state-moat). The agents-can’t-bridge-tacit-knowledge gap is exactly why “company state” (Gupta’s term) and proprietary data context (Natkins) remain durable moats even as the SQL-generation layer commoditizes.
- AE-as-job-shape thread. Continues the through-line from 2026-04-19-analytics-engineering-roundup-five-things-future-of-analytics (Handy: analyst workflow starts to look like front-end SWE) and 2026-04-05-ae-roundup-moving-up-the-stack. Ganz’s “shift left, shift right” is the operational version of Handy’s “the job changes.” Where Handy says what the job becomes, Ganz says which parts of the DAG to chase. Both reinforce the harness-thesis cluster.
- Direct phData seat application. Ganz’s six-month wishlist (agents that check assumptions, are cost-aware, connect frictionlessly) is essentially the consulting brief — these are the gaps a competent AE consultancy fills today by being the human-in-the-loop that catches the silent join bug, the runaway token spend, the missing MCP connector. Frame for client conversations: “Here is dbt Labs telling you exactly what their own agents can’t do yet.”
- Sanity Check candidate — but earn the angle. Don’t write a derivative “jagged frontier in AE” piece restating Ganz. The original re-frame here is: the demarcation line in AE isn’t where it is in SWE because data work has a “code is correct ≠ data is correct” axis that doesn’t exist in pure SWE. That’s the synthesis worth a piece — and only if it doesn’t already exist in the vault.
- SaaS-death-thesis adjacency. Indirectly relevant to 2026-04-01-every-saas-dead-linear / 2026-02-06-stratechery-weekly-saasmageddon-super-bowl — if the middle of the DAG compresses to commodity SQL generation, the dbt-vendor stack itself faces the question of where its durable value sits. Ganz (a dbt employee) is implicitly answering: at the harness layer (MCP server) and the workflow scaffolding, not the SQL generation. Worth tracking as the dbt incumbent narrative.
Related
- 2026-04-19-analytics-engineering-roundup-five-things-future-of-analytics — Handy’s strategic frame; Ganz’s piece is the on-the-ground tactical complement.
- 2026-04-15-every-claude-managed-agents-mini-vibe-check — Williams’s “we need new vocabulary for the AI-pilled” — overlapping move with Ganz importing Mollick’s vocabulary into AE.
- 2026-04-19-commoncog-the-tacit-knowledge-series — Chin’s foundational tacit-knowledge framework; the user_id story is the AE-specific illustration.
- 2026-04-19-commoncog-all-that-is-rare-and-valuable — rare-and-valuable skill intersections; tacit-knowledge × AE = strong moat candidate.
- 2026-04-19-garry-tan-build-the-car-jepsen-response — prior vault use of jagged-frontier framing (as argument FOR routing).
- 2026-04-21-semi-structured-half-life-moat-part-2-dig-faster — moat-half-life thread; tacit knowledge as residual moat after SQL-gen commoditizes.
- 2026-04-13-jaya-gupta-ai-lock-in-state-moat — “state” as moat; ties to “company-shaped knowledge” Ganz says agents can’t acquire.
- 2026-04-05-ae-roundup-moving-up-the-stack — earlier AER piece on AE moving up the stack; Ganz’s “shift left/right” is the next iteration.