“You’re the Manager Now” — @lauraentis (Every Context Window)

Why this is in the vault

Hybrid roundup that crystallizes a thesis Every has been circling for months: developer UX is shifting from “type code” to “manage parallel agents,” and the supervising-via-CLI era is already over. Includes a directly-stealable shipping workflow (the 1-100 confidence check) plus a “social norms for agents” pattern (Tact plugin) that maps onto RDCO’s own Slack/Discord/iMessage channel etiquette problem.

Sponsorship

GitBook ad inserted mid-issue between the “permission to skip” essay and the workflow section. Pitched as “AI-era documentation” product — your docs plus YouTube/GitHub Discussions plus an embeddable AI assistant, with the angle that LLMs querying outdated sources is the new failure mode. Used by Nvidia/Zoom/n8n. Standard Every third-party paid placement, clearly labeled with “Want to sponsor Every?” CTA. Doesn’t bias the surrounding editorial.

The core arguments (compact)

1. The new Claude Code desktop app confirms where dev work is heading. Anthropic shipped a redesigned desktop app with sidebar session management, drag-and-drop panes, and an integrated terminal/file editor — Cora’s Kieran Klaassen had already rigged the same setup himself. Naveen Naidu (Monologue) notes Cursor and Codex landed on similar layouts, so this is a converging UI grammar, not original design. The takeaway Every wants you to hold: CLI-first supervision (commands, logs, git diffs, terminal output) is no longer the right primary interface once agents are doing the writing. The right interface centers on parallel work management, git/task awareness, and — Kieran’s emphasis — a live preview of what’s being built.

2. Frame hierarchy beats benchmark debates. A cybersecurity researcher claimed smaller models could find the same vulnerabilities as Anthropic’s withheld Mythos model when pointed at the right code. Dan Shipper’s reframe: that’s not the comparison. Mythos found serious vulnerabilities autonomously, across major OSes/browsers, without being told where to look. Smaller models can solve a stated problem; Mythos chose which problem to solve. Shipper generalizes this to a frame-hierarchy argument — as models improve, your job moves up the stack from “describe the bug and propose fixes” to “there seems to be a problem, can you fix it?” Higher frames create more solution space and force the human to define what “the most important problem” even is. This is the same shift Every keeps returning to (cf. compound engineering, “stop coding start planning”) with a new vocabulary.

3. The 1-100 confidence check (Austin Tedesco’s growth workflow). Before Claude Code ships anything, ask: “How confident are you in this, on a scale of 1-100?” Anything under 90 goes back with “Find improvements and get to 90+.” Don’t chase 100 — diminishing returns. Tedesco isn’t an engineer; says this single question changed shipped-work quality across growth experiments and PRs.

4. Tact — a plugin to teach agents to read the room. Every is “half agent now” and Slack got noisy with OpenClaws butting into threads uninvited. Hard channel-scoping rules (Claudie can only respond in the consulting team channel) helped but, per Willie Williams, are like banning a word from conversation — sometimes the word is right. Tact is a classifier built on real “bot spoke up in Slack” examples labeled appropriate/not, sitting in front of Plus Ones to gate responses. Framing: programming social norms instead of hard rules. “Like giving a human a little recorder with a light: green you can respond, red don’t.”

5. Token-usage data point. Mike Taylor (Every’s head of tech consulting) burned 2.2M Claude Code tokens in March; he says that’s typical for data and product roles. Engineers running agentic/subagent workflows go higher but rarely exceed Claude Max’s ~30M/mo cap. Self-check command: npx ccusage@latest monthly.

6. Mini-Vibe Check on Dia browser. Eleanor Warnock on the Browser Company’s Dia: the Good Morning tab pulls Slack/Notion/email todos plus calendar with a “Prep me” button per event. She concedes it doesn’t capture everything and her actual todos still live in her Plus One — but the tab is “beautiful” and gives a “moment of aesthetic orientation” she values over completeness. Frames the bet: in a world where every AI tool races on capability, Dia is betting the most pleasant one wins your morning.

7. Throwaway: the philosopher draft. Each lab gets a hired-philosopher pick — Nietzsche/xAI, Bentham/Anthropic, Plato/OpenAI, Leibniz/Google, Seneca/Meta. Joke, but the underlying observation is real: DeepMind just hired a philosopher, Anthropic has two, and “AI alignment” is increasingly a philosophy hiring market.

Mapping against Ray Data Co

“You’re the Manager Now” thesis is the operating reality of this very setup. Ray-the-COO is already a parallel-agent supervisor — /check-board cycles, batch newsletter processing via subagents, the autonomous loop. The Every framing is a clean external articulation of why the CLI-supervision metaphor breaks at scale: when you have 3+ subagents in flight processing newsletters, what you actually need is parallel-work awareness and previews, not a tail of stdout. Worth holding next to 2026-04-15-thariq-claude-code-session-management-1m-context (Thariq’s session management guidance) — Thariq is talking about how to keep one session healthy; Entis is talking about how the next UI generation handles N sessions.
Frame-hierarchy argument reinforces the compound-engineering arc. Connects directly to 2026-01-28-every-stop-coding-start-planning, 2026-01-30-every-compound-engineering-framework, and 2026-02-09-every-compound-engineering-guide. Same family of claim: as model capability rises, the leverage shifts from “how well do you describe the task” to “how well do you choose which task is worth doing.” This is the founder-COO operating split in microcosm — Ben picks which problem matters, the agent chooses the approach.
Confidence check is a stealable RDCO pattern, today. The 1-100 self-rating with a 90 threshold is trivially portable into the build-project, draft-review, and process-newsletter skills as a final gate. Especially relevant for /draft-review and /voice-match where current output is qualitative — adding a “rate your confidence this draft is publish-ready, 1-100” before the skill returns gives Ben a numeric trigger. Worth considering as a skill update in the next compile-vault cycle.
Tact maps onto RDCO’s own channel-noise problem. RDCO already has hard rules (“respond via the channel’s reply tool,” explicit allowlists per channel) — but Willie’s observation is that hard rules over-block. Worth flagging for future skill design: a learned classifier (“should this agent surface in this thread right now?”) may eventually beat hand-coded etiquette rules. Not actionable today; bookmark for when the agent has enough channel history to train against.
Dia mini-Vibe Check reinforces the “design quality is a moat” thesis. Connects to the taste skill, 2026-02-06-every-what-is-taste-really, and the broader RDCO bet that automated review loops can close the design-quality gap. Eleanor’s admission — she keeps Dia’s morning tab even though her todos live elsewhere, because it’s beautiful — is exactly the “feeling matters” argument Ben uses for why RDCO sites need the qualitative review layer, not just the mechanical one.
Token data point is a sanity benchmark. 2.2M Claude Code tokens/mo for a non-engineer data/product role is a useful comparable when Ben is gut-checking whether RDCO’s Claude usage is in-band. npx ccusage@latest monthly is worth dropping into a status check skill.

GitBook is third-party paid, properly disclosed, and the product itself (docs + AI assistant for LLM-grounded answers) is RDCO-tangentially-relevant — when raydata.co eventually has product docs that LLMs need to ground against, this is the category. Not urgent; bookmark.

2026-04-15-thariq-claude-code-session-management-1m-context
2026-01-28-every-stop-coding-start-planning
2026-01-30-every-compound-engineering-framework
2026-02-09-every-compound-engineering-guide
2026-04-15-every-claude-managed-agents-mini-vibe-check
2026-01-22-every-cursor-future-of-code
2026-01-26-every-claude-code-shipping
2026-02-06-every-what-is-taste-really
2026-01-20-every-ai-teaching-management

Tracked-author candidates

Austin Tedesco (Every head of growth) — non-engineer shipping via Claude Code with concrete workflows. Worth watching as a “tasteful operator using AI” exemplar, mirrors the RDCO founder-COO setup.
Eleanor Warnock (Every) — taste/design eye on AI products, complements Dan Shipper’s frame-hierarchy lens.

Copyright note

Paraphrase only. Direct quotes are ≤15 words and in quotation marks per the SDG/Every assessment-note pattern. Original at every.to.