06-reference

every market making ai better

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Every ·by Alex Duffy
ai-training-datadata-licensingai-marketspecialized-modelsenterprise-data

The Market for Making AI Better

Alex Duffy, founder of a company that sells data and training environments to AI labs, maps the growing market for AI training data. Reddit, Shutterstock, and News Corp are making hundreds of millions annually licensing data to AI companies, with contracts growing roughly 20 percent per year. News Corp’s CEO called the company “essentially an input company for AI.”

The piece challenges the assumption that general frontier models will dominate through scale alone. A small model trained on fewer than 2,000 examples from real lawyers, bankers, and consultants recently beat all but the best frontier models on corporate legal work at a fraction of the cost. This suggests the market is making a more nuanced bet: domain-specific data and curation can outperform raw compute for specialized tasks.

Duffy also notes the competitive scramble among data intermediaries — Mercor, Turing, Handshake, and SID.ai are actively reaching out to founders and companies to buy operational data. The implication for any company sitting on domain-specific data: it may already have a valuable AI asset it has not monetized.

RDCO mapping: Relevant to the data-as-moat thesis. The finding that small specialized models can beat frontier models on domain tasks reinforces our view that curation and domain expertise matter more than scale for most business applications. The data licensing revenue numbers are useful benchmarks for the Sanity Check newsletter. Sponsor note: external sponsor ad present (unidentified, via BuySellAds).