06-reference

stratechery nvidia ces vera rubin

Tue Jan 06 2026 19:00:00 GMT-0500 (Eastern Standard Time) ·reference ·source: Stratechery ·by Ben Thompson
nvidiaai-infrastructurekv-cacheautonomous-vehiclesagent-architecture

“Nvidia at CES, Vera Rubin and AI-Native Storage Infrastructure” — Ben Thompson

Why this is in the vault

Explains the KV cache storage bottleneck for long-context agents and multi-agent systems — a hardware constraint that directly shapes what AI agents can do.

The core argument

Three topics covered. First, Nvidia’s CES keynote focused entirely on AI (not consumers), illustrating how AI is crowding out consumer tech investment — even memory component costs are rising as manufacturers shift to HBM for AI chips.

Second, the Vera Rubin platform introduces “AI-native storage” via BlueField4 DPU — dedicated high-speed SSD racks that give each GPU 16TB of east-west storage for KV cache. Thompson explains that every token prediction starts from scratch, reading the full model and full conversation context. Reasoning models and multi-turn agents massively amplify this, making KV cache the key bottleneck. The new architecture enables dramatically longer context windows and shared context across GPUs for collaborating agents.

Third, Nvidia’s Alpamayo autonomous driving system is end-to-end and vision-only (like Tesla FSD), offered as a modular platform for the entire auto industry. This validates the “Bitter Lesson” — scale and compute beat hand-coded heuristics.

Mapping against Ray Data Co

The KV cache / agent context discussion is directly relevant to agent architecture planning. Long-running autonomous agents (like RDCO’s always-on COO) will hit context limits; understanding that this is now a hardware-level problem being solved at the rack level helps frame expectations for what agent memory will look like in 12-18 months.