06-reference / concepts

generative engine optimization geo

Wed Apr 22 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·concept ·status: canonical-for-rdco ·source: terminology-canonization

Generative Engine Optimization (GEO) — RDCO canonical term

Why this note exists

There’s a lot of terminology floating around for “make your content discoverable to LLMs / AI search.” We’ve seen and heard:

The founder picked GEO as the canonical term for RDCO internal docs and external content, after Eric Ciarla’s 2026-04-23 thread referenced it as one of two pivot bets at Firecrawl (“SEO and GEO, because more people are using AI to write code, and if Firecrawl is the top recommended web data API across AI tools, signups grow as those platforms grow”).

Use GEO. Not LLMO, not AEO, not AISO, not “LLM SEO.” Standardize the language so the vault, Sanity Check, and any RDCO-facing content all reinforce one term.

What GEO actually means

GEO is the discipline of optimizing content so that generative AI systems (Claude, ChatGPT, Gemini, Perplexity, Copilot, etc.) recommend or cite your content when their users ask relevant questions.

It is NOT (just):

It IS:

The Princeton paper origin

The GEO term originated in academic literature: “GEO: Generative Engine Optimization” (Aggarwal et al., 2024, Princeton/Georgia Tech). They formalized it as the discipline of optimizing content for visibility in generative engines. Worth tracking down for the foundational reference if we’re going to write our own content on the topic.

Tactics (operational)

These are the levers we already have or can deploy:

  1. Owned terms / named frameworks. AI tools surface named concepts more reliably than vague phrases. RDCO already owns MAC (Model Acceptance Criteria) — that name is leverage. The “underdog effect” referenced in the Princeton paper notes that smaller, less-popular brands gain disproportionate visibility from GEO-optimized content because the field is less crowded.
  2. Markdown-first publishing. Each Sanity Check post should have a /posts/<slug>.md URL alongside the HTML version. AI crawlers prefer the markdown.
  3. llms.txt at site root. Lists the canonical content surface for AI tools to fetch. Cloudflare-blessed convention.
  4. Structured data (JSON-LD). Article, Author, Organization schemas help AI tools understand who wrote what.
  5. Cross-domain mentions. AI tools weight mentions across multiple reputable third-party domains. This is where partnership-driven distribution (Eric’s other bet) reinforces GEO.
  6. Topical authority concentration. Writing 30 articles on data quality discipline beats writing 30 articles across 30 unrelated topics. AI tools reward topical depth.
  7. Direct AI training data inclusion. Open-licensed content posted to GitHub, Reddit, Hacker News etc. has a chance of entering training corpora. Substack / paywalled / membership-gated content does not.

Mapping against Ray Data Co

GEO is one of two compounding distribution bets we could prioritize (alongside integration partnerships per Ciarla’s framework — see ../2026-04-23-firecrawl-ciarla-treadmill-vs-flywheel when filed).

Where we already have GEO infrastructure or work-in-progress:

Where we have NOT yet invested:

The single-file agent-spec pattern (load-bearing for GEO + adjacent)

GEO sits inside a broader pattern that’s now multi-vendor confirmed: the single-file repo-resident agent spec.

FileDomainAuthority
llms.txtSite-level content surface for AI toolsCloudflare / Anthropic
AGENTS.mdSkill resolver / routing tableGarry Tan (gbrain)
CLAUDE.mdProject-level instructions for Claude CodeAnthropic
SKILL.mdPer-skill agent specificationsAnthropic Claude Code convention
design.mdDesign tokens + rationale for AI design toolsGoogle Labs (released 2026-04-22, with Stitch as reference implementation)
mac.md (planned)Data quality acceptance criteria for AI data agentsRDCO (planned 2026-05)

Same shape every time: single file, lives in the repo, YAML frontmatter for structured tokens, markdown prose for the “why,” agent picks it up automatically. Compresses formerly-multi-file specs into one agent-readable artifact.

Strategic implication for RDCO: the MAC pack should ship as mac.md (single-file spec) following this pattern, with explicit citation of DESIGN.md as prior-art. Multi-vendor convergence on the pattern (Cloudflare + Anthropic + Tan + Google Labs) means the pattern is no longer speculative — it’s becoming the default. Being the first single-file-agent-spec for data quality is the wedge.

Cross-references

Glossary lookup discipline (operational)

When you see any of these terms in articles or conversations, mentally translate to GEO:

When writing for any RDCO surface (vault, Sanity Check, X, Discord/iMessage to founder, client-facing decks): use GEO. Spell out “Generative Engine Optimization” on first use in any external piece, then GEO thereafter.