“What are we scaling?” — Dwarkesh Patel
Episode summary
A solo essay narration where Dwarkesh argues that current short-AGI-timeline beliefs are internally inconsistent with the labs’ actual behavior of pre-baking skills via mid-training and RLVR (RL from verifiable reward). His thesis: if a humanlike learner were near, none of this elaborate skill-baking would be necessary. The mismatch between how labs invest and what would be needed for genuine AGI is the giveaway. Continual learning, not pure compute scaling, is the real bottleneck — and it likely takes 5-10 more years to crack at human level.
Key arguments / segments
- [00:00] Setup: Short timelines + bullishness on RLVR is a contradiction. Either models will soon learn on the job (making all the RL environment work pointless) or they won’t (so AGI isn’t imminent).
- [00:01] Robotics as illustration: Robotics is fundamentally an algorithms problem, not hardware/data. A humanlike learner would essentially solve it. The fact that we need to practice in 1000 homes a million times reveals the missing capability.
- [00:02] “Automate Ilya” rebuttal: The “we’ll RL our way to a superhuman researcher who solves AGI” plan is the “lose money on every sale, make it up in volume” joke. Implausible that a system without basic learning capabilities of children invents the algorithm humans have failed to find for 50+ years.
- [00:03] Macrophage anecdote: Biologist friend has long timelines because lab work involves judging whether a slide dot is a macrophage. AI researcher says “image classification is solved.” Dwarkesh: that’s the crux — humans are valuable precisely because we don’t need a custom training pipeline per microtask.
- [00:05] Diffusion-is-slow is cope: AI researchers blaming slow tech diffusion for limited deployment are wrong — if these were humans-on-a-server, diffusion would be near-instant. They could read your Slack/Drive in minutes, dodge the lemons-market problem of human hiring.
- [00:06] Trillion-dollar test: Knowledge workers earn tens of trillions/year in wages. Labs are orders of magnitude off. The gap reveals capability gap, not adoption lag.
- [00:07] Rational goalpost shifting: Some goalpost shift is justified. Gemini 3 in 2020 would have looked like “automates half of knowledge work.” We solved each prerequisite (general understanding, few-shot, reasoning) and still don’t have AGI. Reasonable conclusion: there’s more to intelligence than we thought.
- [00:08] His prediction: By 2030, labs will make significant continual-learning progress, models will earn hundreds of billions/year — but won’t have automated all knowledge work. “Models keep getting more impressive at the rate short-timelines people predict, but more useful at the rate long-timelines people predict.”
- [00:09] Toby Board citation: Connecting O-series benchmark dots suggests we need ~1,000,000x scale-up in RL compute for one-GPT-level boost. Bearish trend.
- [00:10] Continual learning ≠ singular event: Will feel like in-context learning — gradual progression, not a discrete breakthrough. GPT-3 demonstrated ICL in 2020 but we’re still iterating on it.
- [00:11] No runaway from continual learning: Even when one lab cracks it, others reverse-engineer fast. Talent poaching + SF rumor mill + normal reverse engineering have neutralized every supposed flywheel (engagement, synthetic data, etc.) so far.
Notable claims
- Toby Board calculation: ~1,000,000x scale-up in RL compute needed for one GPT-level boost from RLVR (compared to pre-training scaling laws). [00:09]
- Revenue gap: Knowledge-worker wages = tens of trillions/year. Labs are “orders of magnitude off” — used as evidence of capability gap, not adoption gap. [00:06]
- Timeline forecast: Hundreds of billions in lab revenue by 2030, but no full automation of knowledge work. Continual learning solved at human level: 5-10 more years after first traction. [00:08, 00:11]
- Hive-mind continual learning model: Per Baron Miller — specialized continual-learning agents go out, do jobs, bring learnings back to a central model that runs batch distillation. Karpathy’s “cognitive core + skills” frame. [00:10]
- Diffusion claim: AI labor would diffuse into firms faster than human hiring if capability were genuinely AGI-level. [00:05-00:06]
Guests
Solo essay (no guests). References:
- Baron Miller (recent blog post on the cost of expert-written RL training data, and conversation about continual-learning hive-mind architecture)
- Toby Board (RL compute scaling calculation)
- Andrej Karpathy (cognitive-core framing)
- Sam Altman (referenced as “SAT” — said continual learning would be “game set match”)
- Ilya Sutskever (referenced indirectly via “automate Ilia” trope)
Mapping against Ray Data Co
Strong alignment with Sanity Check editorial stance: this essay is the most clearly RDCO-positioned Dwarkesh essay in the recent run. The thesis “models keep getting more impressive at short-timelines pace but more useful at long-timelines pace” is a near-perfect Sanity Check headline candidate.
Specific connections:
- “Diffusion is cope” frame — supports our skepticism of “the AI will eventually pay for itself, just be patient” arguments in enterprise contexts. Use this as ammo when pushing back on AI-deployment ROI hand-waves.
- Macrophage anecdote — perfect concrete illustration of why “we’ll just train it on your data” doesn’t work for most real-world knowledge work. File this for any future Sanity Check piece on RAG-vs-fine-tuning, AI-for-domain-specialists, or “why your PoC doesn’t generalize.”
- Trillion-dollar revenue test — borrowable framework: if a technology claim implies $X trillion in addressable wages, what fraction of $X is the technology actually capturing? If it’s <1%, the capability gap is the real story. Useful for any data-quality / AI-tooling story.
- Continual learning as the missing piece — aligns with Ray’s existing thesis on context, memory, and personalization being the under-discussed frontier. Validates the “fat skills + long context” frame from the Thariq guidance (2026-04-15-thariq-claude-code-session-management-1m-context).
- No flywheel runaway — supports the “don’t bet on a single-vendor lock-in moat” frame. Useful when advising clients on AI vendor strategy.
Sanity Check candidate hook: “The AGI tell isn’t the demos. It’s the supply chain of PhDs that the labs are paying to write training questions.”
Related
- 2026-03-11-dwarkesh-most-important-question-about-ai — same author, later episode, on Anthropic-DoW supply chain risk and alignment-to-whom
- 2025-10-04-dwarkesh-sutton-interview-thoughts — same author, earlier essay, on Sutton’s bitter-lesson critique of LLM paradigm
- Sutton interview itself (Dwarkesh, ~Sep 2025) — the original episode that triggered the Sutton-thoughts essay
- Baron Miller’s referenced blog post on training-data economics (worth tracking down for separate vault entry)
- Karpathy on cognitive core (separate stub)