IndyDevDan — TOP 2% Engineering: /PLAN 2026

Why this is in the vault

This is Dan’s annual “bets” video — 11 predictions + a recap of his 2025 bets. It’s vault-worthy not because every bet is right (he’s already admitting misses on 2025), but because Dan is the single most-watched practicing agentic-coding voice on YouTube, he publishes bets and grades them publicly, and he’s one year ahead of mainstream developer discourse. Three reasons:

The “year of trust” framing unifies every other pattern the vault has been tracking since January. Custom agents, evals, sandboxes, out-loop systems, hooks — Dan organizes all of them under a single primitive: how do you build and defer trust in autonomous agentic systems? This is the strongest single-word unifier the vault has for the 2026 agentic-coding moment.
Dan’s 2025 bets graded honestly — 12/15 ish hits. He nails the agentic-coding-in-terminal bet, the cost-of-code-to-zero bet, the skill-gap-earthquake bet (-25% entry-level roles), the no-wall bet, hyperspecialized LLMs, exponential slop, and data > UX > benchmarks. He misses on infinite memory (“I did not understand this problem deeply enough”) and OpenAI remaining #1 (they didn’t). That batting average earns the next 11 bets serious attention.
Independent convergence with Tobi Lütke (same ingestion cycle) and Thariq (April 15 Anthropic guidance in vault). Dan saying “custom agents above all, private evals, context over prompts, out-loop trust-building” is the same claim set as Tobi saying “constitutions, Toby evals, context engineering” and Thariq saying “more context isn’t free, route long artifacts through subagents.” Three independent voices from three different communities (practicing engineer / public-company CEO / AI lab) are converging on the same architecture.

Core argument

The unifying frame: 2026 is the year of trust in agents. Every bet is about how top 2% engineers build and defer trust in increasingly autonomous systems. Dan’s 11 bets:

Bet on the right labs. Anthropic owns coding, Google owns price+intelligence+speed at scale, OpenAI has dropped to third. Gemini 3 Flash is the specific anchor — top-three intelligence, top-five price, top speed (“this model should not exist”). Opus 4.5 is the coding baseline. Dan’s prescription: don’t be model-monogamous; use Opus for coding-heavy work, Gemini 3 Flash for breadth and cost.
Tool calling is the foundation. The Core 4 = context, model, prompt, tool. Everything — agentic coding, orchestration, custom agents — reduces to the Core 4. Don’t get baited by new framework marketing; if you can’t trace it to the Core 4, it’s noise.
Custom agents above all. The highest-ROI bet of 2026. Custom = your specific system prompt, your tool set, your context, your evals. Generic agents are commoditized; custom agents that know your codebase and your problem are not.
Multi-agent orchestration (not parallelization). Lead agent + command/worker agents. The lead agent is itself a custom agent with CRUD-over-agents tool access. You talk to the orchestrator; the orchestrator handles routing, spawning, and coordination. Agentic Coding 2.0 is this pattern.
Agent sandboxes — defer trust by giving agents their own dev environment. Best-of-N: spin up 10 agents in 10 sandboxes, only merge the winner. You don’t need trust until merge time. This is what senior engineers already do in staging/dev — now we give it to agents too.
In-loop vs out-loop. In-loop = terminal/babysit/one-prompt-at-a-time. Out-loop = Slack/Discord/GitHub/your own system, agent ships a PR, you review. Top engineers maximize out-loop to free in-loop time for the highest-leverage work.
Agentic Coding 2.0. The UI for coding becomes “talk to the lead agent.” No more sub-agent micromanagement. This will require a new UI/application. Dan doesn’t predict the shape but predicts the category.
Public benchmarks get saturated (90–100% across models). Top engineers build private evaluation systems they never publish. Your private benchmark is your alpha. Without it, you can’t tell when a new model is actually a step-change for your use case. Tobi’s “Toby evals” is the same claim.
UIs vs agents — agents eat SaaS. Any SaaS app whose value is CRUD-over-database is cooked. Either the company eats itself with agents first, or a competitor does. The Google search bar is the canonical example.
AGI hype dies. Stop caring about AGI/ASI marketing. Focus on agents. “The decade of agents” (Karpathy) is the operative framing; AGI is vaporware. Top engineers stop responding to AGI discourse entirely.
Bonus: first end-to-end agentic engineer blog post emerges. Someone writes a blog post describing an agent chain that ships a feature prompt-to-production with no human in the loop. Dan calls this the polar opposite of the “AI can’t engineer” crowd. This is the north star for the year-of-trust frame.

2025 bet grading (honest self-review): Hits: AI coding as standard (84% adoption), agentic coding begins, terminal as primary agentic surface (he expected UI, got CLI), cost of code declines, skill gap earthquake (-25% entry-level roles), no wall, hyperspecialized LLMs, small language models on-device, industry-breaking architecture (world models), exponential slop, big tech shrinks / SMB grows, data > UX > benchmarks. Misses: infinite memory (“got this wrong, I did not understand the problem deeply enough”), OpenAI remains #1.

Mapping against Ray Data Co

Dan’s “custom agents above all” is the same claim as the RDCO harness-era thesis. The vault has been building toward this since February: skills over commands, custom agents for specific workflows, constitutions via CLAUDE.md. Dan saying “this is the #1 ROI move for 2026” is direct outside validation.
“Private evals” is the missing discipline for RDCO right now. The vault has constitutions (CLAUDE.md, SOUL.md, skill descriptions) and has skills (the /check-board loop running right now) — but it does NOT have a private eval suite that regressions against. When Opus 4.7 → Opus 5 ships, how do we know if it’s a step change for our use cases? Dan’s bet says we won’t know without private evals. This is a concrete follow-up to build.
“Out-loop vs in-loop” describes the RDCO agentic ops architecture directly. In-loop = the founder typing at the terminal. Out-loop = the Mac Mini autonomous agent running cron jobs, /check-board cycles, /process-newsletter watch, /curiosity. The RDCO infrastructure has been quietly executing this bet since March. What’s missing is the explicit accounting: which tasks belong in-loop, which belong out-loop, and what’s the graduation criterion (trust-level) for moving something from in-loop to out-loop?
Multi-agent orchestration already partly implemented in /process-newsletter and /process-youtube watch modes. Parent spawns sub-agents per article/video, collects summary lines, keeps full text out of parent context. That IS Agentic Coding 2.0 lite — the parent is the orchestrator, the sub-agents are command-level workers. The next step per Dan’s bet is to have the orchestrator itself be a custom agent (not the parent Claude Code session) and to give it CRUD-over-agents.
Agent sandbox pattern applies to RDCO build-project and taste skills. Both involve generation + review. Dan’s best-of-N sandbox pattern says: spawn 10 landing-page drafts in parallel sandboxes (Vercel preview URLs already give us this), review all 10, merge winner. Currently the build-project skill is sequential. Parallelization via sandboxes would accelerate and improve quality.
“UIs vs agents — agents eat SaaS” is the sharpest restatement of the Data Marketplace thesis. If your SaaS is CRUD-over-database with a UI, an agent-first competitor wins. RDCO positioning for Data Marketplace should explicitly lean into “agent-first data access” vs “catalog UI” — it’s the only defensible position.
The AGI-hype-dies bet validates the RDCO editorial stance. Sanity Check has been positioning against AGI hype since January. Dan calling it “one of the greatest marketing schemes of all time” gives us the cleanest single-line quote to use.
Counter-point to the Sorkin ACQ2 ingestion (same cycle). Sorkin argues human interviewers can’t be replaced because the moat is accumulated relational knowledge. Dan argues engineers will build end-to-end agentic systems that ship with no human in loop. Both probably right — engineering is automatable execution; interview work is relational accumulation. RDCO sits at the seam: we automate execution (check-board, newsletter ingestion, graph reingest) and reserve human judgment for positioning, taste, and relationships.

Open follow-ups

Build a private eval suite for RDCO before Opus 5 ships. Candidate eval cases: /process-newsletter on a known sample of 10 newsletters with gold-standard expected extractions; /research-brief on 3 known topics with expected angles; /draft-review on 3 newsletters with known issues. Store as a skill or ~/.claude/evals/. Run against every new model release. This is Dan’s #8 bet materialized.
Formalize the in-loop/out-loop graduation criterion. What trust-level metric does a task need to hit before it moves from founder-in-the-loop to autonomous? Candidate: 90% acceptance rate over the last 20 invocations, with zero catastrophic failures. Document this in SOUL.md or a new rdco-autonomy-policy.md.
Pressure-test the “agent sandbox for build-project” idea. Vercel preview URLs already give us ephemeral sandboxes per deploy. Could we fan out 3–5 landing-page drafts per request, score them automatically via taste+motion-review, and present only the winner? This would be a concrete realization of the best-of-N pattern.
Write a Sanity Check angle on “the year of trust.” Not a recap of Dan’s bets, but a positioning piece on why trust (not capability) is the 2026 bottleneck — and how practicing operators (vs AI labs) solve it. Sources: Dan’s bets, Tobi’s constitutions + Toby evals, Thariq’s context-rot guidance, the Cobus Greyling harness-era piece.
Grade Dan’s 2026 bets in December 2026 as part of the annual retrospective. Set a calendar-event in /curiosity or /morning-prep for 2026-12-15 to re-read this note and assess which bets landed.

~/rdco-vault/06-reference/transcripts/2026-04-19-indydevdan-top-2-percent-plan-2026-transcript.md — raw transcript
~/rdco-vault/06-reference/2026-04-19-indydevdan-claude-code-deletes-production.md — Dan’s companion-week video on hooks as damage control, the trust-infrastructure side of the “year of trust” argument
~/rdco-vault/06-reference/2026-04-19-acquired-tobi-lutke-shopify.md — Tobi on constitutions + Toby evals — independent convergence from a public-company CEO on the same architecture Dan describes for practicing engineers
~/rdco-vault/06-reference/2026-04-15-thariq-claude-code-session-management-1m-context.md — Anthropic’s Thariq on context rot and subagent routing — third independent source on the architecture
~/rdco-vault/06-reference/2026-04-12-cobus-greyling-harness-era-language-shift.md — Greyling’s “harness era” framing — the vocabulary layer that ties these three voices together

IndyDevDan — TOP 2% Engineering: /PLAN 2026

Why this is in the vault

Core argument

Mapping against Ray Data Co

Open follow-ups

Related