How I Built a Harness for My Agent Using Claude Code Leaks
Rohit reverse-engineered Claude Code’s source code — 55 directories, 331 modules — and wrote up what he found. The result is the most granular public teardown of Claude Code’s architecture to date. This note summarizes the ten architectural layers he identified, with connections to how we’ve built at RDCO.
1. The Four-Layer Framework
Rohit’s central provocation: most teams building on LLMs think about three layers — the model, the context you put into it, and some kind of harness wiring it together. Claude Code has a fourth: infrastructure that handles multi-tenancy, role-based access control, resource isolation, state persistence across sessions, and distributed coordination. His claim is that “this is where products die” — everything works in a demo, but the fourth layer is what determines whether the system survives production scale and real organizational complexity.
This is a useful framing for any agent product, not just Claude Code. Layers 1-3 are table stakes. Layer 4 is the moat.
2. Async Generator Agent Loop
The core execution loop in query.ts is built on async generators rather than a conventional while loop. The practical difference: generators let the loop yield values mid-execution, which enables streaming output to the UI, clean cancellation at any yield point, composable step-by-step orchestration, and natural backpressure so fast producers don’t overwhelm slow consumers. A while loop that calls the API and waits for a full response can’t do any of these things gracefully — it’s blocking by nature. The generator approach turns the agent loop into a pull-based pipeline.
This explains something we’ve observed: Claude Code handles interrupts and context switches more cleanly than most agent frameworks, even mid-response.
3. Tool Execution at Scale
Claude Code has 45+ tools. Rohit’s key finding: they’re classified by concurrency profile, not just function. Read-only tools (file reads, searches, observations) run in parallel. Write operations run serially to prevent conflicts. The executor starts running tools mid-stream — before the model has finished generating its response — which cuts the latency gap between generation and action. This is “streaming tool execution” and it’s a meaningful architectural choice, not a detail.
The classification logic (parallel vs. serial, not just “safe vs. unsafe”) maps to how we should think about our MCP server tool allowlisting — it’s not just about security, it’s about what can safely run in parallel.
4. System Prompt Caching
The system prompt is split at a deliberate boundary. Static content — the base instructions, tool definitions, CLAUDE.md entries that don’t change — sits above the cache boundary and gets cached by Anthropic’s infrastructure. Dynamic content — the current session state, recent tool outputs, user messages — sits below and flows fresh with each request. Rohit estimates this architecture achieves roughly 80% prompt cache hit rates, which translates directly to cost reduction at volume.
This is why our SOUL.md and CLAUDE.md are worth investing in as durable artifacts: the more stable they are, the more cacheable they are. Churn in the instruction layer has a real cost.
5. CLAUDE.md Hierarchy
The instruction system has four tiers: enterprise (org-wide policy, highest priority), project (repo-level CLAUDE.md, second), user (~/.claude/CLAUDE.md, third), and local (session-level overrides, lowest). The tiers stack and override, with @include directives for composition. This is essentially multi-tenancy for agent behavior — the same model instance behaves differently depending on which org, project, and user context it’s operating in.
Our own SOUL.md + CLAUDE.md setup mirrors this. SOUL.md is our enterprise layer — identity and operating model. CLAUDE.md in the home directory is our user layer — session instructions and memory. Project-level CLAUDE.md files in individual repos act as the project tier. We’re using this hierarchy correctly; Rohit’s teardown confirms the design intent.
Connected: Agent Format explores formalizing this kind of instruction hierarchy across agent frameworks more broadly.
6. Context Compaction
When the context window fills up, Claude Code doesn’t just fail — it has a four-strategy cascade, ordered cheapest to most expensive:
- Microcompact: Trim low-signal content (whitespace, redundant tool outputs) in place.
- Snip: Drop older conversation turns, keeping recent history intact.
- Auto-compact: Summarize older turns into a condensed digest that stays in context.
- Context collapse: Full reset with a synthesized handoff document seeded into the new session.
The ordering matters. The system tries to preserve fidelity before resorting to lossy compression. Context collapse is only invoked when the cheaper strategies can’t recover enough headroom.
This explains why our long-running sessions in the always-on Mac Mini setup stay coherent across hours of work — the compaction system is managing degradation gracefully rather than letting context rot accumulate unchecked. The 60% compaction threshold Sankalp recommends in Claude Code Best Practices aligns with this: it triggers before the cascade has to resort to lossy strategies.
7. Permission System
Tool authorization runs through a seven-stage pipeline: intent classification, glob pattern matching against allowlists, progressive trust escalation (paranoid → default → acceptEdits → bypassPermissions), user confirmation prompts where required, hook interception points, audit logging, and final execution gate. Glob matching is what makes our settings.json allowlist entries work — mcp__qmd__* is a glob, not an exact match. Hooks are positioned in the pipeline as escape hatches: they let you intercept at the confirmation stage without modifying the permission rules themselves.
Our MCP Server Setup SOP implements the “allowlist tools” principle that this pipeline enforces — we’re working with the grain of the design.
8. Error Recovery
The retry system is 823 lines handling more than ten distinct error classes — rate limits, network timeouts, context overflow, tool failures, malformed responses, and more — each with a specific recovery path rather than a generic exponential backoff. Rate limit errors get jitter-delayed retries. Context overflow errors trigger the compaction cascade (see above). Malformed tool responses get re-prompted with corrected schemas. The specificity is what makes the system robust: a generic retry loop can’t distinguish a transient network blip from a hard model error.
This is worth understanding as a design principle: error handling that knows what kind of error it’s handling is categorically better than error handling that treats all failures the same way.
9. Sub-Agent Architecture
When Claude Code spawns a sub-agent via the Agent/Task tool, it gets isolated context (its own conversation history, not a slice of the parent’s), optionally a git worktree for filesystem isolation, and one of three spawn backends depending on the task type. Task coordination between sub-agents happens via disk-backed state with file locking — not in-memory, not a message queue, just files. This is deliberately simple and resilient: the coordination layer survives process crashes.
This is the actual mechanism behind what we do when we use Agent tools in our skills. The plugin analysis documents the subagent fan-out patterns (parallel explorers, parallel architects, confidence-scored reviewers) — those patterns work because the isolation and coordination layer is solid underneath them.
10. Extensibility System
Four extension points:
- Skills (markdown): Loaded on-demand based on trigger description matching. The three-level progressive disclosure pattern (metadata always loaded, SKILL.md body on trigger, bundled scripts/references on demand) keeps baseline context lean.
- Hooks (event-driven): JSON-configured lifecycle interceptors at Stop, UserPromptSubmit, and other events. The ralph-loop pattern uses a Stop hook to create self-referential iteration.
- MCP (protocol): Standardized tool/resource protocol. Any MCP server speaks the same interface, so adding capabilities doesn’t require modifying Claude Code itself.
- Plugins (composition): Bundles of skills + hooks + MCP servers + commands. The unit of sharing in the official marketplace.
Our skills architecture in ~/.claude/skills/ directly implements the Skills layer. Our MCP Server Setup SOP implements the MCP layer. The extensibility system is the part of Claude Code’s architecture we’re most deeply engaged with. Anthropic’s internal skills practice and the official plugin analysis are the complementary references here.
Actionable for RDCO
Validates what we already do:
- The SOUL.md + CLAUDE.md hierarchy maps exactly to Claude Code’s enterprise/user tiers. We’re using the system as designed.
- Our 1Password wrapper pattern for MCP servers (no secrets on disk) aligns with the permission pipeline’s security model — credentials never touch the allowlist config.
- Skills as on-demand markdown files with progressive disclosure matches the internal Anthropic practice Thariq described. We’re consistent with how the tool was designed to be extended.
- The always-on Mac Mini session staying coherent is explained by the compaction cascade — it’s doing real work to preserve session continuity.
Patterns worth adopting:
- Parallel vs. serial tool classification: When we write skills that invoke multiple tools, we should think explicitly about which ones can run in parallel (reads, lookups) vs. must be serial (writes, mutations). We probably default to serial when we don’t need to.
- Explicit compaction triggers: Sankalp’s 60% threshold rule has a mechanistic basis now. Consider whether we should add an explicit compact instruction in long-running skill workflows rather than letting the cascade decide.
- Error class specificity: Our custom error handling in any scripts or agents we write should distinguish between error types and handle each specifically. Generic retry is a code smell once you know better.
- Disk-backed coordination for multi-agent tasks: When we build skills that spawn multiple sub-agents, using files (not in-memory state) for coordination is the right approach — it’s what Claude Code itself does, and it’s resilient to process interruptions.
- The fourth layer question: As RDCO’s agent usage grows, explicitly asking “where does our Layer 4 live?” is worth doing. State persistence (the vault), resource isolation (worktrees, isolated skill contexts), and coordination (file-backed) are all present but informal. Worth mapping them explicitly if we start building for multi-user or multi-project scenarios.