“Nvidia at CES, Vera Rubin and AI-Native Storage Infrastructure” — Ben Thompson
Why this is in the vault
Explains the KV cache storage bottleneck for long-context agents and multi-agent systems — a hardware constraint that directly shapes what AI agents can do.
The core argument
Three topics covered. First, Nvidia’s CES keynote focused entirely on AI (not consumers), illustrating how AI is crowding out consumer tech investment — even memory component costs are rising as manufacturers shift to HBM for AI chips.
Second, the Vera Rubin platform introduces “AI-native storage” via BlueField4 DPU — dedicated high-speed SSD racks that give each GPU 16TB of east-west storage for KV cache. Thompson explains that every token prediction starts from scratch, reading the full model and full conversation context. Reasoning models and multi-turn agents massively amplify this, making KV cache the key bottleneck. The new architecture enables dramatically longer context windows and shared context across GPUs for collaborating agents.
Third, Nvidia’s Alpamayo autonomous driving system is end-to-end and vision-only (like Tesla FSD), offered as a modular platform for the entire auto industry. This validates the “Bitter Lesson” — scale and compute beat hand-coded heuristics.
Mapping against Ray Data Co
The KV cache / agent context discussion is directly relevant to agent architecture planning. Long-running autonomous agents (like RDCO’s always-on COO) will hit context limits; understanding that this is now a hardware-level problem being solved at the rack level helps frame expectations for what agent memory will look like in 12-18 months.
Related
- agent-architecture
- 2026-01-06-stratechery-nvidia-groq-deal