Moonshots EP 192: AI Insiders Reveal Elon Musk’s Master Plan to Win AI

Summary

WTF episode with Peter Diamandis, Dave Blundin, and Alex Wissner-Gross (Salim in India). Centered on Elon Musk’s infrastructure play: Colossus 2 coming online as a 1 GW data center in Memphis fitted for 500,000 Nvidia Blackwell GPUs, with plans to double to 1 million in 2026 — where Grok 5 will be trained. Alex frames this as the “bitter lesson” (Richard Sutton) applied to hardware scaling: brute force wins over artisanal algorithm design, and now brute force is being applied to physical infrastructure, not just software. Colossus 1 generates its own power via on-prem natural gas cogeneration, suggesting a future of data centers collocated with dedicated power sources because the grid is too slow to catch up. A striking detail: the Markley data center bought all remaining 3 MW generators in the country because 5 MW units were already sold out. xAI’s Grok Code Fast 1 launched at 15x cheaper input tokens and 10x cheaper output tokens than GPT-5, accessible only through coding environments (Cursor, Windsurf) rather than browsers — Alex notes this makes IDE coding environments emerging competitors to web browsers as distribution channels for superintelligence. Dave pushes back on “race to the bottom” framing, calling it crack-dealer economics: give the first hit free, users get addicted, demand is infinite. The episode also covers Elon poaching 14 Meta engineers with purpose and equity over cash, education relevance (Alex advises freshmen to assume superintelligence in 2-3 years and plan accordingly), and MIT’s new 6E entrepreneurship track. Elon’s Master Plan Part 4 on “sustainable abundance” is discussed as abundance becoming a mainstream hyperscaler theme.

Key Segments

[00:05-09:00] Education in the AI era: Alex advises assuming superintelligence in 2-3 years; MIT adds 6E entrepreneurship track
[10:00-18:00] Colossus 2: 1 GW, 500K Blackwell GPUs, on-prem power generation; bitter lesson applied to hardware; training vs inference data center distinction
[20:00-25:00] Grok Code Fast 1: 15x cheaper than GPT-5 on input tokens; only accessible via IDE coding environments; Jevons paradox for code generation
[25:00-30:00] Elon poaching Meta engineers with purpose + equity; Master Plan Part 4 “sustainable abundance”

Notable Claims

Colossus 2: 1 GW data center, 500K Blackwell GPUs ($30B in chips alone), doubling to 1M in 2026
Colossus 1 uses on-prem natural gas cogeneration, not grid power
All 3 MW generators in the country are sold out; 5 MW units were already gone
Grok Code Fast 1: $0.20/M input tokens vs GPT-5’s $1.25/M (15x cheaper), $1.50/M output vs $15/M (10x cheaper)
Alex: assume superintelligence in 2-3 years; any conventional career plan is obsolete
AI model routing optimization at OpenAI (confirmed by Kevin Weil) dramatically reduces inference costs by matching query complexity to minimal sufficient model