Generative Engine Optimization (GEO) — RDCO canonical term
Why this note exists
There’s a lot of terminology floating around for “make your content discoverable to LLMs / AI search.” We’ve seen and heard:
- LLM SEO
- LLMO (Large Language Model Optimization)
- AEO (Answer Engine Optimization)
- AISO (AI Search Optimization)
- GEO (Generative Engine Optimization)
- “AI Search” (informal)
- “Markdown for Agents” (Cloudflare-flavored, a tactic-level term, not a category-level one)
The founder picked GEO as the canonical term for RDCO internal docs and external content, after Eric Ciarla’s 2026-04-23 thread referenced it as one of two pivot bets at Firecrawl (“SEO and GEO, because more people are using AI to write code, and if Firecrawl is the top recommended web data API across AI tools, signups grow as those platforms grow”).
Use GEO. Not LLMO, not AEO, not AISO, not “LLM SEO.” Standardize the language so the vault, Sanity Check, and any RDCO-facing content all reinforce one term.
What GEO actually means
GEO is the discipline of optimizing content so that generative AI systems (Claude, ChatGPT, Gemini, Perplexity, Copilot, etc.) recommend or cite your content when their users ask relevant questions.
It is NOT (just):
- Traditional SEO retargeted at AI crawlers
- Schema markup tweaking
- Adding
llms.txtfiles (that’s a tactic, not the strategy)
It IS:
- Content structured so AI tools can extract canonical answers (clear definitions, named frameworks, owned terms)
- Content distributed where AI tools’ training + retrieval pipelines see it (GitHub, Reddit, niche communities, paper repositories, structured-data sites)
- Authority signals AI tools weight (citations, mentions across reputable third-party domains)
- Direct AI-tool-friendly affordances (clean markdown URLs, llms.txt, structured data, OG metadata, semantic HTML)
The Princeton paper origin
The GEO term originated in academic literature: “GEO: Generative Engine Optimization” (Aggarwal et al., 2024, Princeton/Georgia Tech). They formalized it as the discipline of optimizing content for visibility in generative engines. Worth tracking down for the foundational reference if we’re going to write our own content on the topic.
Tactics (operational)
These are the levers we already have or can deploy:
- Owned terms / named frameworks. AI tools surface named concepts more reliably than vague phrases. RDCO already owns MAC (Model Acceptance Criteria) — that name is leverage. The “underdog effect” referenced in the Princeton paper notes that smaller, less-popular brands gain disproportionate visibility from GEO-optimized content because the field is less crowded.
- Markdown-first publishing. Each Sanity Check post should have a
/posts/<slug>.mdURL alongside the HTML version. AI crawlers prefer the markdown. llms.txtat site root. Lists the canonical content surface for AI tools to fetch. Cloudflare-blessed convention.- Structured data (JSON-LD). Article, Author, Organization schemas help AI tools understand who wrote what.
- Cross-domain mentions. AI tools weight mentions across multiple reputable third-party domains. This is where partnership-driven distribution (Eric’s other bet) reinforces GEO.
- Topical authority concentration. Writing 30 articles on data quality discipline beats writing 30 articles across 30 unrelated topics. AI tools reward topical depth.
- Direct AI training data inclusion. Open-licensed content posted to GitHub, Reddit, Hacker News etc. has a chance of entering training corpora. Substack / paywalled / membership-gated content does not.
Mapping against Ray Data Co
GEO is one of two compounding distribution bets we could prioritize (alongside integration partnerships per Ciarla’s framework — see ../2026-04-23-firecrawl-ciarla-treadmill-vs-flywheel when filed).
Where we already have GEO infrastructure or work-in-progress:
- Cloudflare Pages stack for raydata.co (markdown URLs + llms.txt + structured data are queued for the publish stack rebuild — see hq-raydata-co/ proposal for adjacent infra)
- MAC framework is a named, owned term ready to be our flagship topical concentration
- Vault is a corpus that could feed an AI-tool-friendly knowledge surface (with care around what gets exposed publicly)
Where we have NOT yet invested:
- Topical concentration discipline on Sanity Check publishing (the v3 strategy needs to commit to a 6-12 month topic concentration, not range across topics)
- Cross-domain mention strategy (no current effort to seed mentions across third-party data-engineering communities, GitHub repos, etc.)
llms.txtis not yet deployed- JSON-LD structured data is not yet deployed
- Underdog-effect lever is unutilized (we ARE the underdog in the data-quality-discipline-for-AI-era category)
The single-file agent-spec pattern (load-bearing for GEO + adjacent)
GEO sits inside a broader pattern that’s now multi-vendor confirmed: the single-file repo-resident agent spec.
| File | Domain | Authority |
|---|---|---|
llms.txt | Site-level content surface for AI tools | Cloudflare / Anthropic |
AGENTS.md | Skill resolver / routing table | Garry Tan (gbrain) |
CLAUDE.md | Project-level instructions for Claude Code | Anthropic |
SKILL.md | Per-skill agent specifications | Anthropic Claude Code convention |
design.md | Design tokens + rationale for AI design tools | Google Labs (released 2026-04-22, with Stitch as reference implementation) |
mac.md (planned) | Data quality acceptance criteria for AI data agents | RDCO (planned 2026-05) |
Same shape every time: single file, lives in the repo, YAML frontmatter for structured tokens, markdown prose for the “why,” agent picks it up automatically. Compresses formerly-multi-file specs into one agent-readable artifact.
Strategic implication for RDCO: the MAC pack should ship as mac.md (single-file spec) following this pattern, with explicit citation of DESIGN.md as prior-art. Multi-vendor convergence on the pattern (Cloudflare + Anthropic + Tan + Google Labs) means the pattern is no longer speculative — it’s becoming the default. Being the first single-file-agent-spec for data quality is the wedge.
Cross-references
- ../2026-04-22-ayman-architect-mode-3as — Architect Mode framework. GEO is what compounding distribution looks like in Architect Mode for content publishing.
- conviction-assets-inventory — MAC framework as owned term is one of the conviction assets that powers the GEO bet.
01-projects/newsletter/— Sanity Check v3 strategy doc; GEO should be the load-bearing distribution bet, not a side-tactic.- Ciarla 2026-04-23 X thread — pin-the-pivot reference for GEO + integration partnerships as Firecrawl’s compounding bets.
- Princeton/Georgia Tech (Aggarwal et al., 2024) “GEO: Generative Engine Optimization” — foundational academic reference. Worth a deeper read.
Glossary lookup discipline (operational)
When you see any of these terms in articles or conversations, mentally translate to GEO:
- LLMO → GEO
- AEO → GEO
- AISO → GEO
- LLM SEO → GEO
- “AI search optimization” → GEO
When writing for any RDCO surface (vault, Sanity Check, X, Discord/iMessage to founder, client-facing decks): use GEO. Spell out “Generative Engine Optimization” on first use in any external piece, then GEO thereafter.