stanford hai ai index 2026

Why this is in the vault

The AI Index is the canonical annual scorecard on the state of AI — Stanford HAI’s 9th edition (2026), led by Yolanda Gil, Raymond Perrault, and Erik Brynjolfsson. Even with Alex Wezner-Gross’s “annual cadence is sleeping through the singularity” critique (valid — most data points are 6-12 months old by publication), the report is the consolidated, citable, brand-anchored reference that everyone outside the daily-AI-news bubble will see and quote for the next twelve months. RDCO needs it as the load-bearing source for any Sanity Check piece that touches the macro state of AI, any positioning argument about agent deployment, and any external pitch or memo that has to land with a reader who isn’t already steeped in the discourse.

Note on extraction: the PDF itself exceeds the WebFetch size limit (>10 MB). The chapter list and headline takeaways here are reconstructed from the official 5 takeaways as walked through on Moonshots EP248 (2026-04-18, with Eric Brynjolfsson as a personal contact of the show), cross-referenced with the standard 9-chapter structure used in editions 2022-2025. When citing specific numbers in published RDCO work, pull directly from the PDF page citations rather than this digest.

Chapter structure (carried forward from prior editions, confirmed via EP248 content references)

Research & Development — model count, publications, US-China balance
Technical Performance — benchmarks (SWE-bench, GPQA, HLE, agent benchmarks)
Responsible AI — model transparency index, AI incidents, red-teaming
Economy — enterprise adoption, labor market impact, capex
Science & Medicine — AI-for-science applications
Education — CS enrollment, AI literacy
Policy — legislation, executive actions, state-level activity
Diversity — workforce composition
Public Opinion — global trust and optimism survey

The core findings (top 5-8 net-new shifts)

1. The capability cliff is over: SWE-bench 60% to 97% in twelve months

The single most consequential delta. Software engineering — the closest proxy we have for “can an agent actually complete useful long-horizon work end-to-end” — went from “interesting research demo” to “saturated benchmark” in one annual cycle. Combined with the report’s claim that frontier models now beat top PhDs in science and math, the implication is that the headroom on existing benchmarks is essentially gone. Future editions will need to be measured against agent-task suites (long-horizon, multi-tool, multi-day) rather than single-shot evals.

2. GenAI hit 53% global adoption in three years — faster than PC or internet

The S-curve is not slowing. This is the killshot stat for anyone arguing AI is a hype bubble or that “real adoption” is years away. Faster than the personal computer. Faster than the internet. Three years from a niche research artifact to majority global use.

3. The transparency index inverted: 58 down to 40

The most powerful frontier models are now the least accountable. Two compounding causes: (a) closed-weights commercial pressure as the labs realize moats matter, (b) safety-driven self-censorship as labs internalize that publishing capability research advances proliferation. The Alex Wezner-Gross frame here is sharp — transparency and proliferation can be the same thing, so “more transparency = always better” is naive. RDCO’s position should not be reflexive pro-transparency; the right frame is staged release with red-teaming, not weight-publishing.

4. The optimism gap: 23% US public vs 73% experts vs 80% China

This is the most Sanity-Check-able stat in the entire report. The gap between US public optimism (23%) and US AI expert optimism (73%) is fifty points. The gap between US public (23%) and Chinese public (80%) is fifty-seven points. AI is not unpopular globally — it is unpopular in the United States specifically, and the gap correlates with degree of contact with the technology. The companion stat: only 31% of Americans trust the government to regulate AI, which lines up almost exactly with general institutional trust (Congress ~21%, federal government ~33%) — meaning AI distrust is partly a proxy for institutional distrust, not AI-specific.

5. AI incidents jumped 233 to 362 documented harms YoY (+55%)

Not surprising in direction; surprising in magnitude. The trajectory is exponential and there’s no reason to expect the curve to bend down before models become sharper at bio/chem/cyber capabilities. This is the data dot that closes the loop with the public-opinion drop and with the physical backlash (Molotov at Altman’s house, Maine data center ban, $98B in blocked projects).

6. China leads research, US leads model deployment (and China is going closed)

~50 frontier models from US, ~30 from China in the 2026 cycle; AI publications out of China continue to climb relative to the US. But the buried lede: Chinese frontier labs have started shipping closed, API-first models. The “China = open, US = closed” trope from 2024-2025 is dissolving. If China algorithmically leapfrogs the West, expect the equilibrium to flip entirely. China’s training compute is approximately 10x smaller than the West’s per Epoch — they are getting more publication per FLOP, not more capability.

7. Youth labor displacement is real and politically invisible

Employment among US software developers age 22-25 has dropped ~20% since 2024, while developers 30+ continue to grow headcount. Same pattern in customer service, legal support, administrative roles. Critically: this is happening through hiring freezes, not layoffs — so it does not show up in unemployment statistics or the conventional labor-market dashboards. The trend line in the report ends September 2025; preliminary post-report studies suggest some self-correction since, but the structural story (entry-level white-collar work is being automated first) is intact.

8. The genAI-use chart undercuts the report’s own annual cadence

A meta-finding: the report’s own genAI-adoption chart (the green line referenced on EP248) inflects sharply between 2024 and 2025, demonstrating that an annual publication cadence is genuinely too slow to capture the current rate of change. This validates RDCO’s bias toward higher-cadence content (weekly Sanity Check, daily skill execution) over the legacy-think-tank publication model.

Mapping against Ray Data Co

Agent-deployer thesis (skills + harness + memory layer)

Strongest fit. The SWE-bench saturation (Finding 1) is the single most important external validation of the thesis. If agents can now genuinely complete software engineering work end-to-end at near-human-expert level, the bottleneck shifts from “can they do it” to “how do you deploy them safely, with what scaffolding, with what memory, against what tasks.” That is exactly the surface area RDCO operates on. Use this finding in any pitch where the deployer thesis needs external sanction.

The transparency drop (Finding 3) also matters here: as frontier models go more closed, the moat moves from “having the best model” to “having the best deployment harness around any sufficiently good model.” Closed weights mean every operator is downstream of the same handful of providers; differentiation happens in skills, memory, agentic scaffolding, and operator know-how — RDCO’s exact patch.

MAC (productized data engineering / multi-source AI client reporting)

Moderate fit. The 53% global adoption stat (Finding 2) is the macro tailwind for any productized AI advisory — the addressable market for “we’ll help you make sense of your AI usage and outputs across providers” is now measurable in single-digit billions. The closed-models trend (Finding 3) reinforces multi-source: clients will be running 3-5 model providers and need someone to integrate the telemetry. No specific finding overturns or invalidates MAC; it just continues to validate the bet.

Strong fit. This is the single richest source for newsletter material in 2026 so far. Three specific data dots below. The optimism gap (Finding 4) is a near-perfect Sanity Check column premise: “Why does America think AI is bad while everyone else loves it?” The youth-labor finding (Finding 7) is the most underplayed business story of the year. The transparency drop (Finding 3) gives a contrarian column: “Stop demanding transparency you don’t actually want.”

Squarely Puzzles (consumer puzzle product, founder’s dad’s IP)

No mapping. The AI Index does not bear on a consumer puzzle product. There is no AI-in-toys-and-games chapter and the public-opinion findings, while interesting, do not change the puzzle business in any actionable way. Skip for Squarely.

Sanity Check candidate hooks (data dots / column premises)

Hook 1: “The 50-Point Optimism Gap”

Stat: 23% of the US public is optimistic about AI vs 73% of US AI experts vs 80% of the Chinese public. Hook: AI isn’t unpopular — America is. Frame the gap as a national-character story (institutional distrust at all-time lows, with AI inheriting the load) rather than a technology story. Counter-frame to every “everyone’s worried about AI” narrative. Source: AI Index 2026, Chapter 9 (Public Opinion).

Hook 2: “The Hiring Freeze Nobody Noticed”

Stat: US software dev employment for ages 22-25 down ~20% since 2024, while devs 30+ continue to grow. Same pattern in customer service, paralegal, administrative work. No layoff spike — companies just stopped hiring. Hook: The labor displacement is happening, just not the way anyone predicted. The unemployment rate stays flat; the cohort that takes the hit is the kids who can’t get a first job. This shows up in housing formation and family formation data 5-10 years downstream, not in the next jobs report. Source: AI Index 2026, Chapter 4 (Economy).

Hook 3: “SWE-bench is Saturated. So What’s Next?”

Stat: SWE-bench performance went from 60% to 97% in twelve months. Hook: When the test is solved, what do you measure? The frame: software engineering isn’t the bottleneck anymore — orchestration, judgment, and deployment are. Pivot to RDCO’s harness/skills thesis. Pair with the transparency-index drop for the “the moats moved” framing. Source: AI Index 2026, Chapter 2 (Technical Performance).

Verdict

File as reference. This is a load-bearing citation for any future RDCO content that needs external macro-AI grounding. Do not deep-read the full PDF — Alex’s “too little, too late” critique is correct; most findings are 6-12 months stale by publication. But keep this synthesis live and pull specific page citations directly from the PDF when shipping public content.

2026-04-18-moonshots-ep248-altman-attack-amazon-starlink-opus-47 — Moonshots panel walking through the same 5 takeaways, with Alex’s critique
ai-landscape
ai-backlash — Maine data center ban, Molotov at Altman, $98B blocked projects all tie to public-opinion findings
organizational-singularity — Salim’s frame, validated by youth-labor displacement findings
anthropic