“Datadog’s moat is the human at the keyboard” — Jonathan Natkins
Why this is in the vault
Third installment in Natkins’s running thesis that the durable layer of the AI stack is data infrastructure, not models. Where Part 1 / Part 2 of his “Half-life of a Moat” series (2026-04-21-semi-structured-half-life-moat-part-2-dig-faster) framed which companies survive frontier-model improvement (perpendicular vs. parallel), this piece runs the same logic against a single $60B+ category — observability — and lands on the load-bearing claim that a huge class of “moats” people pay for today are actually human-interface moats. Remove the human as the primary query author and the moat collapses. Directly relevant to the agent-deployer positioning and to any RDCO project where we’re choosing what to build for agents to use vs. for humans to look at.
Employer disclosure inside the piece: Natkins explicitly flags he works for ClickHouse before pitching ClickStack as the architecture that resolves the structural problem. The disclosure is unambiguous and well-placed (he names it before the recommendation, not after). He also previously worked at Datadog historically — that bias cuts the other direction here (he is critical of his ex-employer), and is worth noting for completeness even though he doesn’t disclose it in the piece. No external sponsor block.
The core argument
The structural claim. Datadog’s pricing forces customers to throw away their own observability data to afford the platform — sample logs, shorten retention, drop dimensions. The bill scales with telemetry volume with no ceiling, so the only knob to turn is fidelity. “You paid to collect it. Then you paid again by losing it.” Calling this a pricing problem misses it; it is an architectural problem and architectural problems don’t get solved by negotiating a cheaper contract.
Why Grafana doesn’t fix it. Grafana / Loki / Mimir / Tempo disaggregate the vendor layer but keep the same human-in-the-loop consumption pattern: human writes queries, human stares at dashboards, human gets paged at 3am. Cost gets better; operating model is identical.
Why AI SREs (current generation) don’t fix it. Today’s agentic SREs sit on top of Datadog/Grafana — they’re glorified query generators against the same expensive platforms. They reduce headcount, not telemetry cost. Datadog line item doesn’t move; total observability spend goes up.
The reference-call moment that anchors the piece. Asked what engineers would say if Datadog were taken away: “they’d probably complain a bit” — i.e., a vitamin, not a painkiller. Then the kicker that Natkins says reorganized his worldview: “but in five years, I don’t think they’ll be able to live without it.” Atrophy of the underlying skill (read-the-data-source-directly) is what converts a vitamin into a painkiller. The Maps/MapQuest analogy: nobody can navigate without GPS anymore — not because GPS is irreplaceable but because the muscle is gone.
The reframe. Every dashboard, chart, heatmap exists for human eyes. Agents don’t have eyes. “Every pixel in your Datadog dashboard is a translation layer for a user that doesn’t need things translated anymore.” What the agent needs is data and a way to ask questions — i.e., a database. PromQL fluency, dashboard muscle memory, saved queries, alert syntax — these are all human-interface moats, not data moats. They evaporate when the human stops being the interface.
The architecture that follows. OTel commoditizes collection (the “S3 of observability”). Columnar storage (ClickHouse / Honeycomb’s “Observability 2.0” unified-log-event model — Natkins cites Charity Majors’s claim that every observability startup founded post-2021 that still exists uses this pattern) handles raw, unsampled retention at storage economics rather than platform-margin economics. Sub-200ms query latency is what makes ReAct-style agent loops viable (a 30s response kills the loop). Visualization (HyperDX / dashboards) becomes a feature of the stack, not the product. Datadog’s own Bits AI is the tell — every improvement to it teaches Datadog’s customers to bypass the UI Datadog charges margin for.
The honest counterargument he names. For a 20-person startup, raw telemetry in a database is not plug-and-play — schema design, retention policy, materialized views, pipeline mgmt is real work. Datadog may still genuinely be the right call at small scale. The thesis is about where the line moves to as OTel adoption and tooling collapse setup cost.
Mapping against Ray Data Co
This is a load-bearing piece for RDCO positioning — it doesn’t introduce a new framework so much as give us a clean second worked example of the framework Natkins has been building, applied to a single category we already care about.
1. The “human-interface moat” reframe is the cleanest articulation we’ve seen of why agent-deployer work is durable. The agent-deployer (2026-04-14-levie-agent-deployer-role-jd) is, structurally, the person who decides which interfaces to keep humans in front of and which to remove. Every workflow we audit at MAC or for clients can be sorted: is the interface here a real moat (the customer’s domain knowledge, their judgment, their accountability) or a human-interface moat (PromQL muscle memory, dashboard reflexes, “we’ve always done it this way” UI lock-in)? The first survives agentification; the second is switching cost masquerading as differentiation. This is the operational definition we should be using when scoping client engagements.
2. Direct hit on the agent-deployer + algorithm-layer / sensors-actuators-algorithms framing we’ve been working. The piece is essentially a long-form argument that observability data — the raw logs/metrics/traces emitted by infra — are sensor output, and the value chain is being restructured so that the algorithm (the agent doing root-cause analysis) gets direct access to sensors via a commodity collection layer (OTel) and a commodity storage/query layer (columnar DB), with the actuator being the auto-opened rollback PR. The “dashboard” was never the algorithm — it was a presentation layer for a human algorithm. When you swap in a non-human algorithm, the presentation layer is dead weight. This is the cleanest restatement of why the algorithm layer is where margin accumulates that we have in the vault. File alongside the existing harness-thesis cluster.
3. Reinforces the perpendicular/parallel framework with a worked example. Natkins’s Part 2 (2026-04-21-semi-structured-half-life-moat-part-2-dig-faster) named ClickHouse / Langfuse as canonical “perpendicular” companies. This piece is the receipts: it walks through why ClickHouse gets more valuable as agents get better (sub-second query on billions of rows is the substrate ReAct loops need), and why Datadog’s moat erodes (their best AI product trains their customers off the UI they charge for). RDCO’s own positioning should ask the same question of every project: as agents get more capable, does this thing get more or less valuable?
4. The “fidelity vs. cost” structural flaw maps onto our own data-quality work. RDCO’s state-ownership architecture and the audit-newsletter-outputs.py invariant pattern are both bets that retention of raw evidence is what makes the system trustworthy. Same instinct as ClickStack’s “keep the lot” pitch, scaled down to one founder’s vault. Worth a concept page on “fidelity-first storage as a precondition for agent reliability” — link this piece, link the half-life piece, link the deterministic-verification thread from the Garry-Tan-vs-Kingsbury exchange (2026-04-19-garry-tan-build-the-car-jepsen-response).
5. Sanity Check angle. “The moat you paid for is the muscle you lost” is a single-sentence hook that crosses domains — observability, GPS, MapQuest, but also IDE autocomplete, calculator-vs-mental-math, etc. Natkins already wrote the data-engineering version; the general version is unclaimed and would not be derivative (feedback_no_derivative_sanity_check_pieces). Worth queueing as a Sanity Check candidate.
Mapping strength: strong. This directly extends a thesis we are actively building on, with a worked second example in a category we plan to comment on, by an author already in our tracked-source list.
Related
- 2026-04-21-semi-structured-half-life-moat-part-2-dig-faster — Natkins Part 2, the perpendicular/parallel framework this piece operationalizes
- 2026-03-31-semistructured-data-layer-does-the-work — Natkins’s earlier “every AI app is fundamentally a data app” thesis that this piece references and extends
- 2026-04-14-levie-agent-deployer-role-jd — the agent-deployer JD; this piece is what they would actually do in an observability engagement
- 2026-04-10-akshay-pachaar-agent-harness-anatomy — the harness anatomy framing; “what data does the agent need” is the load-bearing question
- synthesis-harness-thesis-dissent-2026-04-12 — the harness-thesis cluster, where this piece fits
- 2026-04-19-garry-tan-build-the-car-jepsen-response — Tan’s verification-layer point pairs with the fidelity-first argument here
- concepts/products-for-agents — Datadog dashboard as the canonical example of what not to ship if your user is an agent
Copyright note
Paraphrased throughout. Single direct quote retained: “every pixel in your Datadog dashboard is a translation layer” (12 words, within fair-use limit). All other text is summary or reframing in our voice.