06-reference / research

agent seo state of the discipline

Tue Apr 21 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·research-brief ·source: deep-research
agent-seogeollmoaeocontent-strategysanity-checkdistribution

Agent SEO — State of the Discipline (April 2026)

The question

Founder is rethinking content strategy. Today the engine is X-first: short posts, fast cadence, optimized for human social discovery. He’s considering inverting it to blog-first on raydata.co with explicit optimization for LLM citation — content that becomes the canonical source Claude / ChatGPT / Perplexity / Gemini / Google AI Mode pulls from. Thesis: agents are a meaningful share of audience; being cited by an LLM may be more valuable than ranking on Google for a human. He asked: (a) what is the industry calling this discipline, and (b) what techniques are documented as working?

The naming game

Five terms are in active circulation. None has won outright; the field has not consolidated.

Winner-so-far: GEO. It has the academic anchor, the Wikipedia entry, the vendor tooling, and the search volume. AEO is a respectable second and the two are increasingly used as a compound. LLMO and AIO are losing.

The technique catalog

Twelve documented techniques, organized by category. Evidence levels: Rigorous (controlled study), Anecdotal (vendor case studies, blog claims), Speculative (proposed but untested at scale).

Content structure

  1. Answer capsules / front-loaded answer. A question-format H2 followed immediately by a 40–80 word direct answer, then supporting detail. Why: extractable units map cleanly to what LLMs lift into their responses. Anecdotal — strong vendor consensus, no controlled study.
  2. Quotation Addition. Embedding direct quotes from authorities into your content. Princeton GEO measured +41% visibility (Position-Adjusted Word Count) on GEO-bench. Rigorous.
  3. Statistics Addition. Adding concrete numbers, percentages, dated figures. Princeton: +37–40%. Strongest in Law & Government, Debate, Opinion domains. Rigorous.

Citation density and outbound linking

  1. Cite Sources. Linking out to primary sources. Princeton: +30–40% overall, and a striking +115% for rank-5 search results (vs −30% for rank-1) — meaning citation-density helps underdogs disproportionately. Rigorous.
  2. Authoritative tone. Definitive claims, named frameworks, no hedging. Princeton: ~+10%. Smaller lift than expected. Rigorous.

Structured data

  1. Schema.org markup — FAQPage, Article, HowTo, Organization, Person. Vendor claim (geneo, Snezzi, others): pages with FAQPage schema are “3.2x more likely” to appear in AI responses. Anecdotal — the 3.2x figure traces to a single vendor study, not independently replicated.
  2. JSON-LD author entity. Named author with Person schema, linked to ORCID/LinkedIn/About page. Aligns with E-E-A-T signals Google already weights. Anecdotal but well-grounded.

Direct LLM-control conventions

  1. llms.txt. Markdown file at site root pointing crawlers to canonical URLs. Adopted by Anthropic, Stripe, Zapier, Cloudflare, Mintlify, and most dev-tool companies. But — 2025 CDN audits show GPTBot, ClaudeBot, and PerplexityBot do NOT actually fetch llms.txt. Useful primarily for IDE-side agents (Cursor, Aider, Continue) and human-prompted retrieval. Speculative on production-LLM impact; real for dev-tool surfaces.
  2. AI-crawler robots.txt rules. Allow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended explicitly. Standard hygiene; not optimization. Established.

Tone and framing

  1. Named concepts / citation-bait. Coining a term (“answer capsule,” “data observability,” “vibe coding”) so LLMs cite you whenever the term appears in a query. The most under-rated lever in the catalog. Anecdotal — but Princeton’s “Quotation Addition” finding is the closest controlled analog.

Distribution

  1. Own-domain canonical, syndicate to high-authority surfaces. LLMs disproportionately cite Reddit, Wikipedia, GitHub, Stack Overflow, YouTube transcripts, and a small set of high-authority publications. Cross-posting summaries with canonical links back to your domain is the consensus play. Anecdotal — supported by SparkToro/Datos clickstream data on AI tool source mix.
  2. Reddit-and-forum participation. Several vendor studies (Profound, Scrunch) report Reddit threads as one of the top three citation sources for ChatGPT. Authentic participation, not spam, is the consensus tactic. Anecdotal but consistent across multiple sources.

What we actually know works (evidence base)

The Princeton/IIT-Delhi paper (Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande — KDD 2024) is still the only rigorous, replicable study in the public record. The setup: GEO-bench (10,000 queries, 8K train / 1K validation / 1K test) tested against GPT-3.5-turbo + Google Search top-5 sources, with secondary validation on Perplexity. Headline: up to +40% visibility in generative responses from the right combination of techniques.

The honest read: the strongest techniques are surprisingly old-school content craft — quote authorities, cite numbers, link out to primary sources. The weakest are the things vendors love to sell — keyword stuffing (declined), unique-words optimization (no effect). The Authoritative-tone lift was milder than expected (~10%), suggesting “write definitively” matters less than “write with receipts.”

The most interesting finding for RDCO: rank-5 sources gained +115% from Cite Sources while rank-1 sources LOST 30%. GEO is asymmetrically a tool for underdogs. For a domain like raydata.co with no organic authority yet, this is the buy signal — citation-dense, statistic-heavy writing can punch above the domain’s PageRank weight in a way pure-SEO cannot.

Everything else in the catalog is anecdotal: vendor case studies, single-site experiments, before/after screenshots from agencies selling the service. Useful, but not evidence in the Princeton sense. The “FAQ schema = 3.2x lift” claim circulating across the geneo / Snezzi / Pixelmojo blogs all traces to one unreplicated vendor study.

Tooling landscape

Five tools worth knowing. All paid; nothing meaningfully open-source yet.

Signal vs noise. Monitoring (Profound, Peec, Otterly) is real and useful — you cannot optimize what you cannot measure. “Optimization” tools (AthenaHQ, Scrunch) are mostly automated content-rewrite loops; the optimization work itself is still mostly editorial judgment. No open-source option of note yet — a gap.

The credible skeptic position

Two skeptics worth steelmanning.

Rand Fishkin (SparkToro). His clickstream data with Datos shows AI search interest is inflated 10–100x relative to actual usage; traditional Google search is not in measurable decline; at current doubling, AI tools rival Google only in 6–10 years. He rejects the GEO/AEO/LLMO acronym sprawl and prefers “Search Everywhere Optimization.” His critique: the discipline is being sold faster than it is being used. If most queries still go to Google and Google itself is increasingly LLM-injected, the right move is good content widely distributed, not LLM-citation-optimized content.

Google’s Danny Sullivan and Nick Fox. Their position: optimizing for AI search is the same as optimizing for traditional search. The crawl, the index, the ranking signals, E-E-A-T — same machinery whether the surface is a blue link, an AI Overview, or a ChatGPT citation. Partly self-interest, but empirically defensible: AI Overviews pull from the same retrieval index as classic search, and Perplexity’s web search uses Bing. The GEO techniques that work — cite authorities, add stats, link out, name concepts, structured data — are recognizably 2018-era SEO craft with a new label.

The synthesis: GEO is real, but 80% rebranded SEO with three knobs turned up (citation density, structured data, named-entity authority) and 20% genuinely new (llms.txt, share-of-answer monitoring). A founder who does excellent content with discipline captures most of the GEO upside without buying tooling. Risk of going all-in on “agent SEO as a category”: paying tooling tax and writing for the wrong audience while the actual usage curve is still years from Google parity.

Synthesis for RDCO

Recommendation: do GEO as a constraint on existing content craft — not as a new identity. The “agent SEO” framing is mostly correct as a thesis, mostly oversold as a product category, and the techniques that move the needle (Princeton’s findings) are writing-quality upgrades you should be doing regardless. Don’t reposition raydata.co as “the agent-SEO blog.” Reposition it as opinionated reference content on data quality, AI agents, and the data-platform thesis — and bake the GEO-effective patterns into the editorial standard.

The asymmetry in the Princeton data is the strategic insight: citation-dense writing helps low-authority domains 3–5x more than high-authority ones. raydata.co has effectively zero domain authority today — that’s not a disadvantage, it’s the exact regime where GEO’s underdog effect is largest. The leverage move: ship 15–25 reference pieces (~2,000–3,000 words) that name concepts already developed in the vault — data-quality frameworks, the agent-deployer thesis, macro-vs-micro data observability — with heavy quotation, statistics, and outbound primary citation. Each piece is a coined-concept lure: when an LLM is asked about the concept, raydata.co is the canonical reference because no one else holds that handle.

This also resolves the burnout pattern. X-first cadence punishes you for not posting daily; reference content compounds — a piece written in April 2026 is still cited in October 2027. The vault is raw material for ~30 such pieces; the work is editorial conversion, not net-new generation.

Concrete first three moves:

  1. Ship one keystone prototype. Best candidate: “macro vs micro data quality” or “the agent-deployer positioning.” 2,500 words, GEO pattern (question H2s, 40–80 word answer capsules, ≥3 sourced statistics, ≥5 outbound citations, FAQPage + Article + Person schema). Add llms.txt + author entity. Test article, not a brand pivot.
  2. Stand up monitoring before optimization. One tool — Profound at $99 or Peec.ai equivalent — seeded with 20–30 brand and concept queries. Can’t optimize without a baseline. Run 60 days before judging.
  3. Cadence: 2 reference pieces/month, not 8. Two production days per month. X stays as distribution surface for those pieces, not the primary channel. If 90 days of monitoring shows zero citation movement, the thesis is wrong for raydata.co’s current authority and the right move is Reddit/HN/podcast guesting until authority exists.

Skeptic case also right enough to honor: don’t buy the agency narrative, don’t buy more than one tool, don’t write for the LLM at the expense of the human. Princeton-validated techniques are good writing techniques — optimize for both audiences with the same content. GEO-specific surface area is a checklist, not a strategy.

Open follow-ups

Sources