06-reference

technically inference providers

Wed Apr 15 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Technically ·by Justin Gage

“What’s an inference provider?” — Justin Gage (Technically)

Why this is in the vault

Clean explainer that names the category — “inference providers” — and locates it on a spectrum from frontier-lab APIs to raw infra. Useful as a vocabulary anchor when we evaluate vendor choices for any RDCO product that calls a model, and as a reference when explaining the AI stack to clients who have only heard “OpenAI” and “Anthropic.”

Self-promo disclosure

Top of issue pitches Technically’s own redesigned site and a new “All-access subscription” bundling learning tracks plus the archive. This is author self-promo, not third-party sponsorship — no bias on the article body itself, but worth noting that the issue opens with a sales beat before getting to the substance.

The core argument

Two things power every AI product: training (how a model learns) and inference (the model actually doing its job per request). Frontier labs (OpenAI, Anthropic, Google) are technically inference providers themselves — they trained the model and expose an API. But a separate category — dedicated inference providers like TogetherAI, Fireworks, Modal, Groq — has emerged to host open-weights models (Llama, Qwen, DeepSeek) behind a managed API.

Why pick a dedicated inference provider over the frontier lab API:

Author flips the framing: given cost and speed advantages, why ever call the frontier-lab API directly? Three reasons — you genuinely need the latest model, you don’t care about cost/latency, or you didn’t know inference providers existed.

The piece then introduces a spectrum from “most managed” (first-party APIs) to “most raw” (you-configure-the-infra), with cloud hyperscalers as enterprise platforms in the middle. (Email body cut off mid-section at “Cloud Hyperscalers: Enterprise AI Platforms…” — the rest is on the web post.)

Market signal embedded: TogetherAI raising at $7.5B, Fireworks at $4B, Modal at $2.5B, NVIDIA’s $20B Groq acquisition. Category is white-hot and no longer overshadowed by the frontier labs in pure dollar terms.

Mapping against Ray Data Co

Strong relevance. Three threads:

  1. Vendor-choice discipline for our own tooling. Anything we build that hits a model — Sanity Check generation, vault compilation, the autonomous COO loop itself — implicitly picks a point on this spectrum. Right now we’re frontier-lab-default (Claude). The article is a reminder that for the routine, high-volume slices (summarization of newsletter bodies, classification, data-extract style work), an open-weights model on a dedicated host could be materially cheaper without quality cost. Worth a future audit when API spend becomes a real budget line.

  2. Vocabulary for client-facing work. When RDCO advises on AI architecture, “inference provider” is a category most non-technical buyers haven’t heard. This article is a clean primer to point them at, or to lift the spectrum framing from when whiteboarding their stack.

  3. Reinforces the “AI lock-in” thesis (2026-04-13-jaya-gupta-ai-lock-in-state-moat). Gupta’s argument is that state and integration are where real moat lives; this article shows the inference layer itself is commoditizing fast (multiple providers, OpenAI-compatible API as the de facto standard, easy switching). Both pieces independently land on the same point: the model API is not the moat — what wraps it is.

Gap to flag: we don’t have a vault doc that maps the AI stack layers explicitly (model lab → inference provider → orchestration/agent layer → app). This article is the cleanest “inference provider” definition we now own; if we’re going to write that stack-map concept article, this is the source for that layer.

Email body summarized and paraphrased; direct quotes kept under 15 words. Full article at the source URL.