Agent Harness Landscape - May 2026
The question
Verbatim from founder, 2026-05-10 11:34 ET, after reading Addy Osmani’s harness-engineering piece:
“Shopify rolled their own [River]. I guess that’s what I’m trying to find. Is the customization/personalization so important that everyone will need to roll their own or how far can you bootstrap the setup with a productive solution. Ray is really just a thin wrapper around Claude Code.”
Direct strategic input to the Ray-as-a-Service / Ray-Starter-Kit bet decision.
What we already know (from the vault)
- 06-reference/2026-05-10-addy-osmani-agent-harness-engineering - the discipline name. Agent = Model + Harness. Ratchet pattern. CLAUDE.md as pilot’s checklist. HaaS framing.
- 06-reference/concepts/2026-05-10-harness-moat-two-layers-portability - the framing concept doc. Layer 1 universal (90% portable), Layer 1.5 config swap, Layer 2 personal-fit earned.
- 06-reference/2026-04-11-garry-tan-thin-harness-fat-skills - the canonical RDCO architecture: thin harness (~200 lines), fat skills (markdown), resolvers, latent vs deterministic, diarization. “If I have to ask you for something twice, you failed.”
- 06-reference/2026-04-12-harrison-chase-harness-blog - “Your Harness, Your Memory.” Memory is inseparable from the harness. Three-tier lock-in taxonomy (mild / bad / worst). Pitches Deep Agents as the open alternative.
- 06-reference/2026-04-07-claude-code-architecture-teardown - Rohit’s reverse engineering. Claude Code has 4 layers (model / context / harness / infrastructure). Async generator loop. 45+ tools classified by parallelism. Four extension mechanisms (skills / hooks / MCP / plugins).
- 06-reference/2026-04-13-moura-entangled-software-agent-harnesses-dead - the strongest dissent. CrewAI founder argues harnesses commoditize; “entangled software” (data + workflow + agent in one product) is the durable shape.
- 06-reference/2026-05-09-tobi-lutke-river-public-channel-agent - River as deployment-shape evidence. 36% to 77% merge rate in two months from osmosis learning, no model retraining. River refuses DMs.
The market today
The harness market in May 2026 has a clear shape: a small set of “general-purpose” coding/work harnesses (Claude Code, Cursor, OpenAI Codex CLI, Cline, Aider, Continue, OpenHands), a young “personal AI” harness category (Hermes Agent from Nous Research), an academic reference (SWE-agent), and a growing pattern of teams building thin orchestration ON TOP of one of these (Shopify’s Roast wraps Claude Code, not from scratch). All of them ship the same universal-layer kit: a loop, tools (file / bash / search), some sandbox boundary, MCP for external integrations, hooks or events, and a markdown-rules file for personalization. The differentiation is not “what’s in the box” - it’s “what’s the bootstrap floor” and “what does the personal-fit layer look like.”
| Harness | Universal layer (shipped) | Personal-fit layer (operator) | Bootstrap floor | Notable rolled-their-own escapes |
|---|---|---|---|---|
| Claude Code | Loop, 45+ tools, sandbox/permissions, MCP, hooks, skills, subagents, 5-strategy compaction, plugins | CLAUDE.md hierarchy (enterprise/project/user/local), ~/.claude/skills/, .claude/settings.json hooks | ~/.claude/CLAUDE.md + one skill = productive | Shopify Roast (workflow shell), RDCO/Ray (vault + 60+ skills), affaan-m’s “everything-claude-code” perf system |
| Cursor | IDE-integrated agent, semantic+grep search, browser, terminal, .cursor/hooks.json, MCP, worktree sandbox, Cursor SDK (2026, breaks agents out of the editor) | .cursor/rules/*.mdc (project), .cursor/commands/ (skills, nightly) | One .cursor/rules/ markdown file with build/test commands | Cursor SDK lets teams put Cursor’s agent in CI/runtime, replacing the IDE shell |
| OpenAI Codex CLI | Rust loop, tools, ~/.codex/config.toml, MCP via STDIO/HTTP, agent skills, subagents (explicit-only), Agents SDK escape | ~/.codex/AGENTS.md global + project AGENTS.md, [agents] config, MCP servers | AGENTS.md + codex install = productive | Codex-as-MCP-server pattern: orchestrate Codex from Agents SDK for deterministic pipelines |
| Aider | Repo map (PageRank over symbol graph), git auto-commit per change, edit-format coders (EditBlock, UnifiedDiff, Architect, etc.), LiteLLM 100+ provider routing | CONVENTIONS.md (read into prompt), YAML config, model selection | pip install aider-chat + aider in repo = productive | Aider is itself often the rolled-their-own atop OpenAI/Anthropic SDKs - very thin foundation |
| Cline | VS Code extension, ReAct loop, plan/act mode toggle, browser tool, tool-creation-on-the-fly, MCP first-class | .clinerules/*.md (workspace) + global rules, Memory Bank pattern, conditional rules with YAML frontmatter glob | One .clinerules/ file = productive | Memory Bank: operators wire their own persistent memory layer because Cline’s session memory is shallow |
| Continue | Agent / Chat / Plan modes, MCP, custom slash commands, async “Continuous AI” pivot | config.yaml rules (text or markdown), baseAgentSystemMessage, MCP additions, Mission Control central rule registry | Install + config.yaml + one rule = productive | Pivoted hard to async/CI agents in 2026, ceding interactive IDE ground to Cursor |
| OpenHands | V1 immutable event-log architecture, Docker sandbox, CodeAct agent, Pydantic-typed tools, MCP-aligned, GitHub integration, micro-agents | Custom agents, micro-agents (small task-specific), tool definitions | Docker + one config = productive; cloud version one-click | All-Hands-AI run their own platform on top; SWE-Bench 72% baseline is the rolled-their-own benchmark target |
| SWE-agent (Princeton) | The “ACI” academic reference: linter-gated edits, custom file viewer, history processors, tools for repo-scale navigation | Configurable via YAML, prompt templates | pip install + LM key + GitHub issue URL | Pure research harness; the reference everyone else implicitly compares to |
| Hermes Agent (Nous Research) | “First personal AI agent that ships with the harness already built in” - automated 5-layer harness (loop/tools/memory/skills/sandbox), self-improving skill writer (auto-generates ~/.hermes/skills/* after notable runs), multi-platform gateway (Telegram/Discord/Slack/WhatsApp/Signal/Email/CLI) | Skills accumulate automatically rather than being authored, persistent cross-session memory by design | One install command + auth = productive; harness ratchets itself | The first harness explicitly automating the ratchet. 27k+ GitHub stars by Apr 2026. |
| Shopify Roast | Ruby DSL workflow orchestrator on top of Claude Code. Convention over configuration (Rails philosophy). CodingAgent invokes Claude Code as a tool inside structured workflows | Workflow definitions in Ruby, prompt files, step composition | gem install roast + workflow.rb = productive | This IS the “rolled their own” - but it’s a thin shell on top of Claude Code, not a from-scratch harness. River the agent is built on top of Roast + Claude Code + Shopify’s MCP servers + LLM proxy. |
Per-harness deep-dive
Claude Code (Anthropic)
Universal layer is the most architecturally complete in market: async-generator loop, 45+ tools classified by concurrency (read = parallel, write = serial), 7-stage permission pipeline, 5-strategy compaction cascade (microcompact / snip / auto-compact / context-collapse), four-tier instruction hierarchy (enterprise / project / user / local), four extension mechanisms (skills / hooks / MCP / plugins), subagent task isolation with disk-backed coordination. Personal-fit lives in CLAUDE.md files at four tiers + ~/.claude/skills/ markdown + .claude/settings.json hooks + per-project .mcp.json. Bootstrap floor is genuinely tiny: install Claude Code, add a CLAUDE.md, you are productive in 10 minutes. RDCO/Ray is the proof: 60+ skills, 1490 vault docs, multi-MCP, deterministic audit hooks - all built ON TOP of Claude Code without forking anything. Sources: Anthropic docs, Rohit teardown, Alex Op full-stack writeup, vault: Claude Code architecture teardown.
Cursor
Universal layer is IDE-first: agent runs inside the editor with semantic+grep codebase search, browser tool, terminal, .cursor/hooks.json for pre/post-action scripts, MCP for external services, git-worktree sandboxes for parallel agents. The 2026 surprise was the Cursor SDK which breaks the agent out of the IDE - operators can run Cursor agents in CI / runtime / arbitrary contexts. Personal-fit migrated from .cursorrules (legacy single file) to .cursor/rules/*.mdc (per-glob, version-controlled, scoped). Bootstrap is one rules file with build/test commands. Cursor’s published guidance treats rules as “the single biggest lever to make Cursor stop hallucinating.” Sources: Cursor agent best practices, Cursor docs rules, vault: AlphaSignal Cursor SDK followup.
OpenAI Codex CLI
Universal layer is a Rust-built loop with native tool execution, MCP via STDIO/HTTP servers in ~/.codex/config.toml, agent skills invoked via $skill-name, opt-in subagents configured in [agents] block, and an Agents SDK escape hatch that exposes the entire CLI as an MCP server (so larger orchestrators can invoke it deterministically). Personal-fit is AGENTS.md (the same standard Cline / others read), with AGENTS.override.md for per-machine overrides. Bootstrap is npm install -g @openai/codex (or brew) + an AGENTS.md file. Codex is the most “Anthropic-lookalike” of the competing harnesses - Anthropic’s Claude Code shipped first, OpenAI followed with very similar shape. Sources: Codex CLI docs, AGENTS.md guide.
Aider
Universal layer is the smallest of the major harnesses but disproportionately load-bearing for one capability: the PageRank-based repo map that builds a directed graph of symbol definitions+references across the entire codebase, then ranks files by relevance and renders the top-ranked definitions as elided code views inside the token budget. This is the part Hermes-agent and others now publicly cite as the gold standard for repo-scale context selection. Plus: every agent change is an atomic git commit, multiple coder variants for different edit formats (EditBlockCoder, WholeFileCoder, UnifiedDiffCoder, ArchitectCoder), LiteLLM-routed model agnosticism (100+ providers). Personal-fit is CONVENTIONS.md read straight into the prompt + YAML config + model choice. Bootstrap floor: pip install aider-chat && aider inside a git repo. Aider is the most purist “fat skills, thin harness” implementation in market. Sources: Aider repo map docs, Simran Chawla’s architectural analysis.
Cline
Universal layer: VS Code extension running a ReAct (Reason-Act-Observe) loop with plan/act mode toggle (operator can force planning before action), browser tool, MCP first-class with the ability to create new MCP servers from inside Cline (self-extending toolkit), per-tool human approval gates. Personal-fit: .clinerules/*.md workspace rules + global rules (~/Documents/Cline/Rules), conditional activation via YAML frontmatter glob patterns, plus the Memory Bank pattern (operator-authored markdown structure that Cline reads at session start to recover state across forgettable sessions). Cline reads .clinerules/, .cursorrules, .windsurfrules, AND AGENTS.md - explicitly cross-tool compatible. Bootstrap: install extension + one rules file. Sources: Cline rules docs, Memory Bank pattern.
Continue
Universal layer: VS Code / JetBrains extension with Agent / Chat / Plan modes, MCP support, custom slash commands. Distinguishing 2026 move: pivoted to “Continuous AI” - async background agents that enforce standards in CI, conceding interactive IDE ground to Cursor. Personal-fit: config.yaml (or .md) rules, baseAgentSystemMessage model-level overrides, Mission Control central rule registry. Bootstrap is install + config.yaml + one rule. Continue is the harness whose moat moved fastest: started as Cursor competitor, became async-first-team-process tool. Sources: Continue docs rules, Continue.dev pivot review.
OpenHands (formerly OpenDevin)
Universal layer: V1 architecture with immutable event log (every action and observation is an event, enabling deterministic replay and pause/resume - a feature most other harnesses lack), Docker sandbox, CodeAct agent (the SWE-Bench 72% baseline against Claude Sonnet 4.5), Pydantic-typed tools, MCP-aligned sandboxing, GitHub integration, “micro-agents” for small task-specific work, cloud sandboxes for parallel agent execution. Personal-fit: micro-agent definitions, custom agent classes. Bootstrap: Docker + config OR one-click cloud. The most “operator owns the agent’s full execution history” of any open-source option. Sources: OpenHands.dev, OpenHands V1 architecture, arxiv 2407.16741.
SWE-agent (Princeton)
Pure academic reference. The “Agent-Computer Interface” (ACI) thesis: how the agent talks to the computer matters more than which model it is. Innovations later adopted everywhere: linter runs on every edit and BLOCKS syntactically-broken code from being committed, custom file viewer instead of cat, history processors that compress context. Bootstrap: pip install + LM key + GitHub issue. Not a productized harness for daily operator use - it’s the citation other harnesses use to justify their tooling decisions. Sources: arxiv 2405.15793, SWE-agent docs.
Hermes Agent (Nous Research)
The newcomer that matters most for RDCO. Universal layer: the harness already built in - all 5 layers automated (loop / tools / memory / skills / sandbox). Self-improving learning loop: after each task, Hermes evaluates whether to write a skill (triggers: tool called >5 times, mistake-then-fix, user correction, unobvious-but-effective path). Auto-writes to ~/.hermes/skills/* without operator authoring. Multi-platform gateway: native to Telegram, Discord, Slack, WhatsApp, Signal, Email, CLI - the “channels” architecture that Ray independently arrived at, but shipped as default. Persistent cross-session memory and “deepening model of who you are” by design. Personal-fit accumulates automatically rather than being hand-authored - this is a structural bet that the personal-fit layer can be auto-ratcheted, not just hand-curated. Bootstrap: install + auth = productive. 27k+ GitHub stars by Apr 2026. Sources: hermes-agent.nousresearch.com, DataCamp tutorial, DEV writeup. Caveat: I have not run Hermes; the “5 layers automated” claim is from their docs and a third-party review, not first-hand verified.
Shopify Roast (the “rolled their own” example)
This is the critical clarification for the founder’s question. Shopify did NOT build River from scratch as a competing harness to Claude Code. They built Roast - a Ruby workflow orchestration framework that follows Rails’ “convention over configuration.” Roast wraps Claude Code as a tool: CodingAgent is the integration point that invokes Claude Code from inside structured workflows. Workflows can interleave agentic Claude Code steps with deterministic non-AI Ruby code. Roast 1.0 (Apr 2026) replaced YAML configs with a pure Ruby DSL. River - the Slack-native agent that opens 1,870+ PRs/week and crossed 50% of Shopify’s code being AI-generated - sits on top of: (a) Shopify’s internal LLM proxy, (b) “MCP everything” internal MCP servers (GSuite, Slack, Salesforce, internal data warehouses), (c) Roast for workflow shape, (d) Claude Code for the agentic execution underneath. Sources: Shopify Engineering: Introducing Roast, Shopify/roast GitHub, ZenML LLMOps DB writeup, vault: Tobi River public-channel agent, Bessemer Atlas: Shopify AI playbook, First Round AI feature.
The “rolled their own” pattern
The founder’s framing was that “Shopify rolled their own [River].” Sharp correction: Shopify rolled their own ORCHESTRATION SHELL (Roast in Ruby), but the agent execution underneath is Claude Code. This is the dominant pattern in 2026 - not “build a competing harness from scratch,” but “build a thin domain-shaped shell that invokes one of the universal harnesses for actual agent work.” The escape valves teams build are:
- Workflow orchestrator on top (Shopify Roast around Claude Code) - when the universal harness’s loop is too unstructured for a specific repeatable process. Adds determinism in Ruby/Python code, calls the agent for the latent steps.
- Public-channel deployment shell (River around Roast around Claude Code) - when the org needs apprenticeship-by-osmosis, the harness operator builds a Slack-native surface that forces the agent to refuse DMs.
- Domain-specific MCP server fleet (Shopify’s internal MCPs for GSuite/Slack/Salesforce/data warehouse) - the universal harness ships zero domain knowledge, so operators add MCP servers that expose their internal systems with the right authn and shape.
- CI / async deployment (Continue’s pivot, Cursor SDK, OpenHands cloud) - when the operator wants the agent off the dev’s machine and into background runtime.
- Auto-ratcheting skill accumulation (Hermes) - when the operator wants to automate the “every failure becomes a rule” discipline so the personal-fit layer self-builds.
What teams almost NEVER do in 2026: write a competing model-loop-tools-context layer from scratch. Even Shopify - the most-cited “rolled their own” example - chose to wrap Claude Code rather than rebuild it. The economic gravity is decisive: the universal layer is too good and too cheap to rebuild. The interesting innovation moved up to orchestration / deployment-shape / personal-fit accumulation.
This matches Moura’s dissent (06-reference/2026-04-13-moura-entangled-software-agent-harnesses-dead): harnesses commoditize fast. The durable shape is “entangled software” - data + workflow + agent in one product surface. Roast is exactly that for Shopify; River is the deployment-shape on top. RDCO is the same shape for solo-founder COO work; Ray is the deployment-shape on top.
How far can bootstrap go?
Founder’s question dead-on: Ray is a thin wrapper around Claude Code. Channels MCP turned on, knowledge base provisioned, first few skills configured (SOUL.md, CLAUDE.md), then everything else built up through dialogue. Mapping that against the harness-moat-two-layers framework:
- Layer 1 (universal harness, ~90%): Claude Code ships this. Loop, tools, sandbox, MCP, hooks, skills, subagents, compaction, permission pipeline - all out of the box. No bootstrapping required beyond installing the CLI.
- Layer 1.5 (config swap, ~50% adapter work): MCP server selection (which Gmail, which Calendar, which Notion), deployment target (Cloudflare vs Vercel),
bets.jsonshape, skill ON/OFF. Ben configured Ray’s 1.5 layer in days. - Layer 2 (personal-fit, the earned 10%): CLAUDE.md hard rules earned through specific founder failures, memory files (
feedback_calibrate_overconfidence,feedback_no_em_dashes, etc.), voice match, queue calibration. This took 9 months of operating-time and cannot be shortcut.
Three categories of operator emerge:
-
Bootstrap-and-stay (Ray, most current Claude Code users): Run the universal harness as-shipped, build personal-fit on top through markdown + skills + MCP picks. No fork, no shell, no orchestration layer. The “thin wrapper” pattern Ben describes. Productive in days, mature in months. The vast majority of operators can stay here forever and never hit a real wall.
-
Workflow-shell (Shopify Roast, operators with repeatable structured processes): When you have a process that repeats hundreds of times per week with the same shape (Shopify code review at 1,870 PRs/week), the unstructured agent loop costs more than building a Ruby/Python orchestrator that calls the agent for latent steps and uses deterministic code for the rest. This is the threshold where teams “roll their own” - but they’re rolling their own SHELL, not their own harness.
-
Full-stack shell (River, very large orgs): Apprenticeship-shape requirements (osmosis learning, public-channel constraints, multi-team skill sharing) force a deployment shell on top of the workflow shell. Only worth building when the org is large enough that the visibility flywheel actually compounds. Tobi’s dataset: 5938 employees, 4450 channels.
The bootstrap floor is genuinely productive. Two data points:
- RDCO/Ray, 9 months of operating time, has not needed to fork or build a competing harness. Every extension fits as a skill, MCP server, hook, or vault doc inside the universal Claude Code shape.
- Shopify, the most-cited “rolled their own” example, didn’t rebuild the harness either. They built an orchestrator on top.
The “must roll your own” threshold for solo-operator and small-team work is further out than founder intuition suggests. The threshold kicks in only when you have a high-frequency repeatable structured process where unstructured agent work is too noisy (Roast threshold) or when org-shape demands osmosis learning across many humans (River threshold). Neither applies to RDCO’s solo-founder COO surface today.
Synthesis for RDCO
For the Ray-as-a-Service / Ray-Starter-Kit decision, the research supports a productizable kit at the universal-harness + scaffolding-skills layer, not a from-scratch harness. The market reality is: every major harness ships a similar universal layer, every operator’s personal-fit layer is unique and earned, and the interesting product opportunity sits between them - the discipline + scaffolding + first-batch skills that compress the personal-fit accumulation period from months to weeks.
The right shape for the kit: package the universal-discipline layer (the ratchet pattern, hooks-as-enforcement, subagent routing for context rot, splits-for-evaluation, skill format, generative-UI return channel, vault-as-nervous-system, todo+loop vs Notion-queue distinction, memory file format) plus a Layer-1.5 swap kit (MCP server picks, deployment-target swaps, bets.json template, skill ON/OFF menu) plus a starter Layer-2 (10-20 baseline rules earned across multiple founders that are likely to apply broadly). What stays bespoke: each operator’s CLAUDE.md hard rules, voice, and accumulated memory files. The pitch is “we sold you the harness discipline and starter rules; your job is to operate it long enough to earn your own personal-fit layer.”
The escape valves to design INTO the kit (so operators can extend without forking): (1) a Roast-style optional workflow orchestrator slot for when a structured process emerges; (2) a public-channel surface scaffold (HQ + decisions click-back rail + iMessage return channel - already built for RDCO, generalizable) for when collaborators arrive; (3) an MCP “hot swap” registry so operators can add their domain stack without touching skill code; (4) an auto-ratchet hook that emulates Hermes’s pattern - flag candidate skills/rules from session events for human approval, accelerating the Layer 2 fill-in. The most defensible RDCO product is NOT a competing harness; it’s the operator’s playbook + scaffolding + ratchet automation that turns Claude Code (or whichever universal harness) from “powerful tool” into “Ray-class operator” in 6 weeks instead of 6 months.
One sharp risk to flag: Hermes’s “harness already built in, self-improving from day one” is direct competitive overlap with the RDCO Starter Kit thesis. If Hermes’s auto-ratchet + multi-platform gateway works as advertised, it eats the bottom of the market RDCO would otherwise serve. The differentiator must be: RDCO sells the operating discipline + earned-rule starter pack + the specific skill set for COO-class founder work (newsletter, deep research, vault hygiene, finance pulse, content production), not just “an agent that learns from you.” Worth a focused Hermes evaluation - install it, run it for two weeks alongside Ray, measure where the auto-ratchet succeeds and where it produces noise.
Open follow-ups
- Hermes Agent hands-on eval: install, run two weeks alongside Ray, measure auto-ratchet quality vs founder-curated ratchet. Direct competitor risk to Ray-Starter-Kit thesis.
- Roast pattern adoption test: would Ray benefit from a Roast-equivalent workflow orchestrator for the structured RDCO processes (newsletter pipeline, deep-research, video production)? Or are skills + subagents already the right abstraction?
- Cursor SDK as deployment escape: would running Ray-class workflows via Cursor SDK in CI/runtime unlock anything Claude Code’s headless mode doesn’t?
- OpenHands V1 event log: replay/pause/resume is genuinely missing from Claude Code today. Worth tracking whether Anthropic adds it or whether OpenHands becomes the right substrate for “audit-grade” agent work that needs full execution history.
- “OpenClaw” reference: founder mentioned in concept doc; only forum reference found is Hermes-vs-OpenClaw comparison in a Chinese tech outlet. Probably alternate name or transliteration. Defer until the founder clarifies.
- Tool-count audit at RDCO: Osmani’s “ten focused beats fifty overlapping” - we have 30+ MCP servers across the Mac mini. Worth a quarterly audit pass.
Sources
Vault
- ~/rdco-vault/06-reference/2026-05-10-addy-osmani-agent-harness-engineering.md - the discipline name (May 10 2026)
- ~/rdco-vault/06-reference/concepts/2026-05-10-harness-moat-two-layers-portability.md - two-layer framework (May 10 2026)
- ~/rdco-vault/06-reference/2026-04-11-garry-tan-thin-harness-fat-skills.md - thin harness, fat skills (Apr 11)
- ~/rdco-vault/06-reference/2026-04-12-harrison-chase-harness-blog.md - your harness, your memory (Apr 12)
- ~/rdco-vault/06-reference/2026-04-07-claude-code-architecture-teardown.md - Rohit’s Claude Code reverse engineering (Apr 7)
- ~/rdco-vault/06-reference/2026-04-13-moura-entangled-software-agent-harnesses-dead.md - the strongest dissent (Apr 13)
- ~/rdco-vault/06-reference/2026-05-09-tobi-lutke-river-public-channel-agent.md - River as deployment-shape evidence (May 9)
- ~/rdco-vault/06-reference/2026-04-15-thariq-claude-code-session-management-1m-context.md - context-rot guidance (Apr 15)
- ~/rdco-vault/06-reference/2026-04-30-alphasignal-warp-rust-codebase-followup.md - Cursor SDK + Warp open-source (Apr 30)
Web
- Anthropic Claude Code skills: https://code.claude.com/docs/en/skills
- Alex Op full-stack writeup: https://alexop.dev/posts/understanding-claude-code-full-stack/
- Cursor agent best practices: https://cursor.com/blog/agent-best-practices
- Cursor docs rules: https://cursor.com/docs/context/rules
- OpenAI Codex CLI docs: https://developers.openai.com/codex/cli
- AGENTS.md guide: https://developers.openai.com/codex/guides/agents-md
- Aider repo map: https://aider.chat/docs/repomap.html
- Simran Chawla Aider analysis: https://simranchawla.com/understanding-ai-coding-agents-through-aiders-architecture/
- Cline rules docs: https://docs.cline.bot/customization/cline-rules
- Cline Memory Bank: https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets
- Continue rules docs: https://docs.continue.dev/customize/rules
- Continue.dev pivot review: https://vibecoding.app/blog/continue-dev-review
- OpenHands platform: https://www.openhands.dev/
- OpenHands V1 architecture: https://www.openhands.dev/blog/openhands-index
- OpenHands paper: https://arxiv.org/abs/2407.16741
- SWE-agent paper: https://arxiv.org/abs/2405.15793
- SWE-agent docs: https://swe-agent.com/0.7/background/aci/
- Hermes Agent: https://hermes-agent.nousresearch.com/
- Hermes Agent docs: https://hermes-agent.nousresearch.com/docs/
- Hermes Agent DataCamp tutorial: https://www.datacamp.com/tutorial/hermes-agent
- Shopify Engineering: Introducing Roast: https://shopify.engineering/introducing-roast
- Shopify/roast GitHub: https://github.com/Shopify/roast
- ZenML Roast writeup: https://www.zenml.io/llmops-database/structured-workflow-orchestration-for-large-scale-code-operations-with-claude
- Bessemer Atlas Shopify AI playbook: https://www.bvp.com/atlas/inside-shopifys-ai-first-engineering-playbook
- First Round AI Shopify feature: https://www.firstround.com/ai/shopify