“Externalization in LLM Agents” — Zhou et al. (arxiv 2604.08224)
Why this is in the vault
Academic validation of the harness thesis. This survey paper traces the same historical progression that practitioners like Garry Tan, Harrison Chase, and Cobus Greyling are describing from experience — but formalizes it with a structured taxonomy. The paper’s framing of “externalization” (capabilities moving from inside the model to the runtime around it) is the cleanest academic articulation of why harness engineering is now the dominant concern.
Paper details
Title: Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Key claim: LLM agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into four categories:
- Memory stores — persistent state that outlives a single call
- Reusable skills — packaged procedures the agent can invoke
- Interaction protocols — standards for how agents communicate with tools, users, and each other
- Harness engineering — the surrounding program that orchestrates model calls, manages context, and enforces safety
Core contribution
The paper positions these four categories as interconnected forms of the same underlying trend: externalization. It traces a historical progression:
- Weights — early approach: bake capability into model parameters
- Context — middle period: feed capability in via prompts and retrieval
- Harness — current: build capability into the orchestration layer
It analyzes trade-offs between parametric (internal) and externalized capability, and identifies emerging directions including self-evolving harnesses and shared agent infrastructure.
Assessment
Strengths:
- Provides a unified vocabulary across memory, skills, protocols, and harness — useful for Sanity Check content that needs precise terminology
- The externalization framing is elegant and maps cleanly to practitioner language
- Large author team suggests broad literature coverage
Limitations:
- Survey papers by nature lag practitioner reality; the taxonomy may already be incomplete given how fast the harness space moves
- No original experiments — this is synthesis, not new evidence
Bias flags: None obvious. Academic survey, no commercial affiliation declared in the author list.
RDCO mapping
- Sanity Check utility: Use as the academic spine for “The Harness Era” article. The externalization framing is more precise than “things moved outward” — it names what moved and why.
- Vocabulary alignment: The paper’s four-category taxonomy (memory, skills, protocols, harness) maps almost exactly to Garry Tan’s framework and to the RDCO agent architecture.
- Cross-reference: Greyling’s three-layer timeline (weights/context/harness) appears to be derived from or inspired by this paper’s historical analysis.
Related
- 2026-04-12-cobus-greyling-harness-era-language-shift
- 2026-04-11-garry-tan-thin-harness-fat-skills
- 2026-04-12-harrison-chase-harness-blog
- 2026-04-10-akshay-pachaar-agent-harness-anatomy
- cross-check-agent-architecture