“Everyone wants to be AI-pilled. Most Companies Are Still Level 1” — Ann Miura-Ko

Why this is in the vault

Miura-Ko proposes a 6-level autonomy framework (L0-L5) for AI-pilled organizations, modeled on the AV self-driving levels. Diagnostic four-question lens: What can AI see? What can AI do? Who can extend the system? How has the org changed? Directly load-bearing for: (1) RDCO’s own self-assessment as an agent-deployer org, (2) MAC positioning to phData clients (the framework gives a sales diagnostic), (3) Sanity Check material on the operator side of the agent-deployer thesis.

The core argument

“AI-pilled” is being used as binary when it should be a 6-level continuum analogous to AV self-driving. Companies differ on intensity (how deep AI goes) AND technical capability (what AI is allowed to see/do/change). The four-question lens (See / Do / Extend / Change) is the diagnostic. Most companies announce L4 ambitions while operating at L1.

The 6 levels (compressed)

L0 — AI as theater. Nothing structured to see, nothing of consequence to do, no one extending, org unchanged. Hard test: AI completes any recurring business process end-to-end? False positive: CEO speech about transformation while running the same exec staff meetings.

L1 — Personal productivity. Each user feeds AI individually. No org-level visibility. Power users are heroes whose workflows leave when they leave. Hard test: if your best AI user left, would the workflow remain? False positive: “80% use AI weekly” — meaningless.

L2 — Team workflow. Shared context per team (claude.md per team, function-specific MCP). Bounded actions within a team. Hard test: does the workflow cross team boundaries? False positive: “AI workflows in every department” but the workflows don’t connect — AI-enhanced silos.

L3 — Organizational infrastructure. Whole org queryable. Core systems of record exposed via CLI/MCP/well-defined APIs. Agents act across systems (update CRMs, open PRs, reconcile invoices). Non-engineers AUTHOR shareable skills, not just consume. Org chart materially different from 2023. Token-maxing over headcount-maxing. Hard test: agent answers across systems “what shipped, who asked, what broke, what customers said, what to do next” without convening a meeting. False positive: capture without synthesis.

L4 — Compounding operating system. Agents update agents. Skills marketplace propagates wins, removes duplicate effort. Policy-driven decision authority within scoped domains. Custom internal harnesses. Non-engineers ship production internal tools. Hierarchy collapses toward “channel managers” of agent workflows. Hard test: workflow that improved because the SYSTEM learned (not because a heroic person manually fixed it) PLUS three production tools shipped by non-engineers in the last quarter. False positive: agent sprawl — 100 brittle automations don’t equal compounding.

L5 — Virtually self-driving organization. Six L5 markers: notices without being asked, synthesizes across sources, decides whether action is warranted, acts within delegated authority, escalates when uncertainty/consequence exceeds authority, updates shared memory so future behavior improves. Generative — the system asks its own questions. The L4→L5 leap: at L4 the system improves because humans direct it; at L5 because it notices it should. Hard test: what important thing did the company notice/decide/act on/learn from recently without a human initiating? False positive: “fake autonomy” — preconfigured rules + threshold alerts dressed as agency.

Asymmetry rule: Companies rarely answer all four questions at the same level. The asymmetry tells you where the next intervention should focus.

Mapping against Ray Data Co

This section is load-bearing for two parallel use cases. Both flagged for parent-handler synthesis (parent will write the RDCO level-assessment + phData take in iMessage; this section just stages the framework cleanly):

1. RDCO self-assessment surface — the 4-question lens applied to RDCO is genuinely interesting because RDCO is structurally a one-founder + one-COO-agent org. Some questions translate cleanly (What can AI see? Vault + Notion + Gmail + Calendar + state files = strong L3). Some don’t (Who can extend the system? RDCO has no non-engineers — the “non-engineers shipping production tools” criteria becomes “do skills get added by markdown edits, not engineering tickets?” which is YES). Parent will deliver the explicit level-assessment. This vault note just establishes the framework for future cite-back.

2. phData / consulting playbook surface — directly relevant to phData’s positioning when rolling out agentic workflows at clients:

Diagnostic question for first-meeting positioning: “What level are you at? Let’s apply the 4-question lens.”
Anti-pattern detection: “Your ‘AI strategy’ announcement doesn’t move you up. The org-chart change is the lagging indicator that you’ve actually moved.”
Sales hook for MAC specifically: L3 demands “core systems of record exposed via CLI/MCP/well-defined APIs” — that’s literally the surface MAC’s Scope × Basis matrix grades. MAC is the L3→L4 instrumentation for data work.
Sales hook for the broader RDCO advisory positioning: the framework gives consultants a structured way to talk about progression without selling vapor.

3. Tickered / scarce-targeting connection — when LLMs make generic “AI-assisted code” abundant, the Tickered framing applies (per Packy McCormick’s Scarce Assets piece, 2026-04-30-not-boring-scarce-assets-abundance-driven-scarcity). Miura-Ko’s framework is the operational form: L3+ orgs are scarce because they require explicit structural choices that L1 announcements can’t fake. The targeting system is what moves you up the levels.

4. Compound Engineering / orchestration thesis cluster — Miura-Ko’s L4 description (“agents update agents, skills marketplaces propagate wins, removes duplicate efforts”) is exactly the architecture Trevin Chow + Kieran Klaassen are building in the Compound Engineering plugin (2026-01-09-trevin-chow-agent-orchestration-thesis, 2026-05-01-trevin-compound-engineering-v3-4). Miura-Ko gives the org-design language; Chow/Klaassen give the implementation pattern. Together they form a coherent agent-deployer playbook.

Open questions for founder

RDCO’s explicit level claim — parent agent will draft, but founder should sign off on whether to publish/cite it.
Whether phData wants this framework as a sales tool or whether RDCO uses it as a positioning differentiator (one company can’t do both with the same client base).
Whether to develop a Sanity Check piece that uses the framework (different angle than the Tickered piece, separate slot).