Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis

Why this is in the vault

The /cross-check skill flagged that our vault had zero counter-arguments to the harness thesis — our core architectural belief. This doc fills that gap. Six distinct lines of dissent exist, none definitive, but the synthesis is a meaningful qualification: the harness matters but is necessary-not-sufficient; data is the durable moat; the harness you build today may get absorbed by the model provider tomorrow; and the verification layer you bolt on inherits the same statistical-plausibility failure mode it’s supposed to filter.

The six counter-arguments

1. “Models will absorb the harness” — HIGH long-term threat

Ben Thompson provides this argument even while supporting the thesis. Anthropic is already absorbing harness functionality: native Memory Tool, MCP, Agent SDK, 1M+ context windows. If the model provider ships native memory, tool orchestration, and context management, third-party harnesses become thin wrappers again.

RDCO implication: Our skills layer is more vulnerable than our vault layer. Skills describe process — models could learn process from fine-tuning. The vault contains proprietary knowledge — models can’t absorb what they haven’t seen. The defense is data density, not architecture cleverness.

2. “Skills are just prompts with extra steps” — MEDIUM threat

When Garry Tan open-sourced gstack, critics called it “a bunch of prompts in a text file.” The core claim: the systems-engineering vocabulary (skills, resolvers, diarization) is performative complexity over what’s really just prompt engineering.

RDCO implication: This criticism has legs for simple use cases. It weakens for our system because our skills include tool integration (Gmail MCP, Notion MCP, yt-dlp), retry logic (sub-agent delegation), and validation (BiasAudit, sponsor detection) — things that genuinely are not “just prompts.” But we should be honest that the SKILL.md files are closer to structured prompts than to compiled code.

3. “The harness is premature optimization” — MEDIUM threat

Max Woolf (widely-shared skeptic-tries-agents post) found that raw model improvements (Opus 4.5) were sufficient to convert him — suggesting the model did the heavy lifting, not the harness. 40% of AI agent projects are reportedly failing in 2026, and the failure mode is typically complexity, not simplicity.

RDCO implication: We should track whether our skills are genuinely load-bearing or whether the model would produce equivalent results without them. The /improve skill’s “mediocre pattern” analysis is our defense — if we can show that skill X catches errors the model alone wouldn’t catch, the skill earns its keep. If we can’t, it’s premature optimization.

4. “Data is the moat, not architecture” — HIGH threat (reframes rather than refutes)

Over half of VCs identify data quality as the primary AI moat. Gartner classifies foundation models as strategic commodities. The argument: the best harness in the world produces commodity output if it operates on commodity data.

RDCO implication: This is the most important counter-argument for us. Our vault (580+ items of processed, cross-linked, bias-flagged content) IS proprietary data. The harness (skills + check-board + sub-agents) is the mechanism for building and querying that data. Both matter, but if we had to pick one to defend, defend the data. The vault compounds; the harness can be rebuilt.

5. “Complexity kills” — MEDIUM threat (execution risk)

The AI startup collapse narrative: most AI products could be rebuilt by a junior dev in an hour. But the criticism cuts both ways — if thin wrappers are dead, fat harnesses may be the next wave of over-engineering wiped out by the next platform shift. Multi-agent orchestration introduces exponentially growing coordination overhead.

RDCO implication: Today we spawned 50+ sub-agents across two content pipelines. If any of those agents had silently failed, corrupted data, or produced wrong vault entries, we’d have 50 bad documents propagating errors through the knowledge base. The PM1e confabulation was one instance of this. We need validation layers (the /cross-check skill) specifically because the complexity of multi-agent systems creates new failure modes.

6. “Verification-layer LLM contamination” — HIGH threat (added 2026-04-19 from Kingsbury)

Surfaced by Kyle Kingsbury’s “Future of Everything is Lies” essay (2026-04-19-kingsbury-future-of-everything-is-lies) and the side-by-side scoring against Garry Tan’s rebuttal (2026-04-19-garry-tan-build-the-car-jepsen-response). Of Kingsbury’s ten enumerated arguments, Tan ducked four; this is the strongest of the four.

The harness thesis says: the model is unreliable, but the harness wraps it in deterministic checks (skill files, tools, resolvers, verification) — the system becomes trustworthy even when the model isn’t. Kingsbury’s deep punch: the verification layer is itself LLM-contaminated. When skill files defining acceptance criteria are themselves drafted with LLM help (as ours are), verification inherits the same statistical-plausibility failure mode it’s supposed to filter. The skill might encode a wrong procedure that LOOKS right. The acceptance criteria might pattern-match plausibility instead of correctness. You can’t prove your verifier with another LLM-written verifier — you just push the bullshit one layer deeper.

This is the form of the critique that survives Tan’s “harness fixes it” answer because Tan’s counter-examples (deterministic stock APIs, Pillow image processing) describe ground-truth tools where correctness IS verifiable. But the moment a skill file’s acceptance criteria are themselves a judgment call (Was this newsletter properly de-duped? Does this draft hit the founder’s voice? Is this sponsor disclosure adequate?), the verification depends on a human-or-LLM judgment that has the same failure modes as the production output.

RDCO implication: Our skills, MAC framework, and /audit-model checks are all partly LLM-drafted. We don’t yet have an external invariant test suite that runs deterministically against skill outputs. The strongest concrete answer is to build Jepsen-style invariant tests for our highest-volume skill (/process-newsletter is the natural starting point — it spawns hundreds of sub-agent runs whose outputs we currently never check against deterministic schemas). Until we have that artifact, our claim that “the harness verifies the model” is a story, not an audit. Kingsbury would call this the strongest single objection to the entire thesis as RDCO practices it.

The synthesis

The harness matters, but it’s necessary-not-sufficient. Data is the durable moat. The harness you build today may get absorbed by the model provider tomorrow. And the verification layer you bolt on inherits the same statistical-plausibility failure mode it’s supposed to filter — until you can prove its outputs against deterministic invariants.

For RDCO specifically: our competitive position is the vault (proprietary data + cross-links + bias flags), not the skills (which could be replicated or absorbed). The skills are the means of building the vault; the vault is the asset. Garry Tan’s framework is correct as architecture guidance but incomplete as moat theory — and its claim that “the harness verifies the model” is a story until we have invariant tests against skill outputs.

RDCO cost model (concrete data the discourse lacks)

Component	Monthly Cost
Claude Max + overage cap	$200-300
Notion	$8
ElevenLabs	$5-105
1Password	$3
Mac Mini (amortized)	$36
Electricity + internet	$25
Core agent stack	$277-477/mo

Cost per unit of work (April 12 example): ~650 items processed at ~$0.35-0.40/item. 20-50x cheaper than the cheapest human alternative that could do half this work.

2026-04-11-garry-tan-thin-harness-fat-skills — the thesis this doc critiques
2026-04-19-garry-tan-build-the-car-jepsen-response — Tan’s expanded harness defense, scored against Kingsbury point-by-point
2026-04-19-kingsbury-future-of-everything-is-lies — the Kingsbury source essay (32-page primary dissent document)
cross-checks/2026-04-12-cross-check-agent-architecture — the cross-check that identified the missing-voices gap
2026-04-10-jaya-gupta-anthropic-moat — trust thesis (complementary to data-moat argument)
2026-03-25-seattle-data-guy-know-nothing-and-be-happy — the comprehension-loss critique (related to complexity-kills argument)