Jensen Huang and Andy Grove, Groq LPUs, Hotel California
- Source: Stratechery (Ben Thompson)
- Date: 2026-03-18
- Type: daily-update
- RDCO Relevance: Medium-High (AI compute architecture, inference pipeline)
Thompson analyzes GTC 2026 as a strategic inflection point for Nvidia, drawing parallels to Andy Grove’s “Only the Paranoid Survive.” After years of defending a single-GPU-architecture approach, Nvidia announced three distinct rack products: Vera Rubin (GPU), Groq LPUs (acquired architecture), and Vera CPUs.
The technical breakdown of Nvidia’s new disaggregated inference pipeline is valuable: Vera Rubin handles prefill (parallelizable, GPU-suited), then manages KV cache attention calculations (memory-intensive, needs GPU’s larger memory), while Groq LPUs handle the feed-forward computation over model weights (memory requirements bounded, benefits from Groq’s deterministic low-latency). Vera CPUs handle agent orchestration, addressing the reality that agents need CPU compute between model calls.
Thompson frames this through Grove’s lens: Nvidia is paranoid about losing customers. If GPUs are not fast enough for some workloads, Nvidia will offer a new architecture. If CPUs are not fast enough to keep GPUs busy, Nvidia will sell CPUs. The unifying theory is not technical elegance but market dominance through completeness.
RDCO note: The Vera CPU story is directly relevant to agent infrastructure. Thompson and Huang both acknowledge agents create CPU-bound bottlenecks between GPU inference calls. This validates our observation that agent orchestration cost is not trivial.