“Learning on the Shop floor” — @tobi (Tobi Lütke, Shopify CEO)
Why this is in the vault
Founder shared 2026-05-09 ~15:21 ET as another Tobi piece (vault has his April Acquired interview but not this). Five lines through the piece make it clear this is a direct empirical validation of the architecture pattern Ben + I have been building toward — chat-based agents that work in shared / publicly-visible surfaces rather than private windows. Tobi names the design constraint (refuse DMs, force public channels) AND reports the consequence (osmosis learning across the company, merge rate doubling without model retraining). This is rare: most agent-deployment pieces are vendor pitches; this one is a CEO of a public company describing a design tradeoff and the measured outcome.
Bookmark-to-like ratio of ~1:1 (694 bookmarks / 664 likes) is a tell that this is being saved-for-later by practitioners, not just liked-and-forgotten.
The core argument
Shopify built River, an AI agent that lives in their company Slack. Wide adoption: 5,938 employees worked with River across 4,450 channels in the last 30 days. River opened 1,870 PRs in the last week alone. About 1 in 8 PRs merged into the main monorepo were authored by River, reviewed by humans.
The design constraint that distinguishes River from every other coding agent: River refuses direct messages. When you DM River, she politely declines and suggests creating a public channel. Tobi himself has #tobi_river with 100+ people who watch, react, add color, pick up the torch, help with reviews, and “remind me how rusty I am.”
Every conversation is searchable. Anyone at Shopify can jump in. People started learning from each other. A support engineer scrolls back through someone else’s #their_river channel to see how a senior person scoped a request before sending her own first one.
The German word for the structure: Lehrwerkstatt (teaching workshop). The whole shop floor is the classroom. You learn by being near the work. Tobi: “Shopify wants to be a Lehrwerkstatt at scale and River has now gotten us closer to this ideal than ever.”
Load-bearing data point
River’s merge rate went from 36% → 77% over two months. The mechanism:
- They did NOT retrain the model
- They did NOT switch models
- The improvement came from “people watching River work, noticing where it got stuck, and writing down what it should have known”
- Every team’s accumulated taste flows into the agent
- The agent gets better at being Shopify (specifically, not generally)
Plus: River has memory that “is constantly learning and un-learning critical information about the company and the best way to do work.”
This is the harness-thesis at the company-org-process layer: model improves slowly, harness/skill/instruction layer improves daily because the entire company is watching where it fails.
Why this matters more, not less, with AI
Tobi names the apprenticeship-replacement worry directly: “Why would a junior developer learn to debug if the agent does it for them?”
His re-frame: the risk is not that AI does the work. The risk is that AI does the work and we never learn from it. Private-window deployment locks everyone except the operator out of the apprenticeship. Public-channel deployment makes the whole company an apprentice.
Counter-positioning closing: “the company moves at the speed of its slowest secret.” Meetings, email, private DMs are slow because information from them never fully diffuses without huge additional communication effort. Public agent conversations are fast, searchable, teachable, and compound.
Mapping against Ray Data Co
Strong on multiple axes — this validates RDCO’s emerging architecture and surfaces one specific failure mode we should fix.
What we already do that matches River
- iMessage + Discord as the inbound channels. Both are Ben + Ray surfaces by design. iMessage is a 1:1 (Ben’s wife isn’t reading it, but it’s not Ray-only either). Discord #ops is technically multi-user (RDCO Discord workspace, currently founder-only but extensible).
- /vault/ + /decisions/ + /bets/ on hq.raydata.co are Ray’s analog of River’s public-channel record. The HQ surfaces are the visible-internal record of what Ray decided, why, and what’s open. Behind Cloudflare Access SSO so it’s “public” only to Ben today, but the architecture supports adding viewers without redesigning.
- /skills/ ecosystem is the analog of “every channel can pre-load the zones, skills, and instructions its team needs.” When I (or the founder) discover a pattern that should be permanent, it gets codified as a skill and survives across sessions. River’s “skill someone wrote to teach River about the company’s checkout data warehouse gets reused by twelve other teams” is the same shape at the file system layer.
Where Tobi’s piece exposes a real gap
Most of my actual reasoning happens in private Claude Code session output that no one else can read. The vault writes are public-via-HQ but the chain of thought that led to each write — the dispatched subagent prompts, the tool calls, the “this didn’t work, try that” iterations — lives in the session transcript only Ben can see (and only by re-opening this specific session). When the session compacts, even Ben loses easy access.
Per Tobi: “if every interaction with an agent happens in a private window, the only person who learns anything is the person at the keyboard. Everyone else is locked out of the apprenticeship.”
That’s a problem for two cases RDCO will hit:
- Future RDCO clients (services). When RDCO sells data engineering / TDD / pipeline work, the buyer won’t see how the work got done unless we make it visible by design.
- Future RDCO collaborators (hires). When the team grows past one founder, a new joiner can’t learn from watching past sessions because past sessions are locked behind Anthropic’s session-storage.
What we’d build to close the gap (not today, but on the roadmap):
- Per-session transcript export to a /sessions/ HQ surface (would need redaction for sensitive content)
- Decision-log auto-write at every major Ray inflection point — already partly happening via vault writes, but the chain of reasoning doesn’t get captured systematically
- Public skill commit log — every time a skill changes, the diff + the reasoning gets a vault note. Today this happens informally; could be made deterministic via /improve.
What this validates retroactively
The /decisions click-back pattern (built today): every decision the founder makes via the HQ surface generates a structured iMessage payload that I receive AND that gets archived in the vault decision log. That’s exactly the “every conversation is searchable” property River has. The instinct was right; the article gives the language.
Where we’d diverge from Tobi
For solo-founder RDCO TODAY, “public channels” don’t apply because there’s no second person to watch. The piece is most relevant as architecture preparation for when collaborators arrive, not as an immediate behavioral change.
Counter-pattern: for genuine 1:1 founder-AI work, private channels are fine. The trap is silently extending the private-channel default to all future contexts when the team grows. Build the public-channel surfaces (HQ) NOW so they’re load-bearing by the time a second viewer joins.
Sanity Check candidate (high quality)
Working title: “The agent that refuses to work in private.”
Original re-frame: most teams deploy AI in private windows (Cursor, ChatGPT-DM, Claude-IDE) and it feels productive. The one team forcing public-channel-only collaboration is seeing osmosis learning compound — and getting a 36→77% agent merge rate improvement out of organizational behavior change, not model upgrades.
Piece could explore:
- Why does private-window default feel right but cost so much?
- What’s the smallest team where the public-channel constraint pays back? (Tobi reports 5,938 employees — does it work at 50? 5? 1+AI?)
- The Shopify pattern at solo-operator scale: even with one human, building agent surfaces in PUBLIC means the agent is teachable later when collaborators arrive. The cost of retrofitting public visibility AFTER deploying privately is much higher than building public-by-default.
- Tie-in: Jaya Gupta’s “shape is the moat” piece (filed today) argues the org structure is the moat. River is an example of the SHAPE making a particular kind of agent-collaboration possible. Same week, two pieces converging on the same insight from different angles.
Voice match: empirical observation + surprising design constraint + contrarian takeaway. Founder voice strength.
Tier: high-priority research-brief candidate. If founder green-lights, dispatch /research-brief tobi-river-public-channel.
Notable quotes (≤15 words each, in quotation marks)
- “She only works in the open.”
- “The whole shop floor is the classroom.”
- “The agent makes the whole company an apprentice.”
- “The company moves at the speed of its slowest secret.”
- “It is fast, it is searchable, it is teachable, and it compounds.”
Open follow-ups
- Audit RDCO’s current “private vs public” surfaces. What’s currently private (Claude Code session output, internal vault file edits, skill iterations) that should be made visible-by-default for future-collaborator readability?
- Consider an “exposed reasoning” surface on HQ: per-decision chain-of-thought log, auto-written by Ray when major decisions happen. Would surface the “how did Ray get here” that today lives in session transcript only.
- Tobi link to his 2018 piece “The Future Role of Human Excellence” — chess analogy. Worth pulling separately for the apprentice-master frame.
- River specifically: deeper research on the implementation. Is River public-source? Or Shopify-internal? If public, lift patterns directly into RDCO’s skill ecosystem.
Related
- 06-reference/2026-04-19-acquired-tobi-lutke-shopify — prior Tobi vault entry (Acquired interview, “living in everyone else’s future” + trust batteries + constitutions + private evals)
- 06-reference/2026-05-08-jaya-gupta-shape-as-moat — same week, complementary thesis (org shape is the moat; River is an example of the shape)
- 06-reference/2026-05-08-thariq-unreasonable-effectiveness-html — HTML-as-output is part of the same “make agent work visible” thesis
- 06-reference/2026-05-08-dan-farrelly-background-agents-orchestration — durable-orchestration is the harness layer; River runs on top of one
- 06-reference/2026-05-09-smart-ape-md-vs-html-three-questions — md-canonical / html-derivative pattern; River’s Slack channels are the operational version of “canonical content + many derivative views”
- 06-reference/2026-04-22-every-bread-in-ai-sandwich — already uses Tobi’s “trust batteries” concept
- 06-reference/concepts/ — candidate concept doc: “the public-channel constraint” (deserves canonical RDCO-term treatment)
Source caveat
Article body retrieved via xmcp getPostsById with tweet.fields: ["article", ...] + expansions: ["article.cover_media", "article.media_entities"]. Same fetch path validated repeatedly this week. Plain text returned full body (~1700 words). Article references two of Tobi’s prior blog posts (apprentice programmer, future role of human excellence) — not pulled here, candidates for separate fetches if the chess analogy or the Siemens-apprenticeship origin story become load-bearing for the Sanity Check piece.