Graph DB Prototype - Results

Ingestion summary

Source files ingested: 50 (from ingested-files.txt)
Total vertices: 394
- Topic: 190
- Document: 106
- Person: 65
- Publication: 33
Total edges: 616
- about-topic: 285
- cites: 199
- authored-by: 86
- published-in: 46

Note: Document vertex count (106) exceeds 50 because wikilinks in Related sections create stub Document nodes for cited-but-not-ingested files. These stubs have no outbound edges of their own but let 1-hop citation queries succeed.

Query 1 - Positioning evidence (state-ownership architecture)

Runtime: 5.48 ms

Target doc: rdco-state-ownership-architecture

1-hop: Documents that cite the state-ownership architecture doc

Document	Path	Date
OpenAI’s Memos, Frontier, Amazon and Anthropic — Ben Thompson	`06-reference/2026-04-14-stratechery-openai-memos-anthropic.md`	2026-04-14
”The Half-Life of a Moat (Part 1)” — Jonathan Natkins	`06-reference/2026-04-14-semistructured-half-life-of-a-moat-part-1.md`	2026-04-14
”Fat Skills, Fat Code, Thin Harness” — Commentary on Garry Tan’s Architecture	`06-reference/2026-04-14-tan-fat-skills-fat-code-thin-harness-commentary.md`	2026-04-14
Aaron Levie — “The Agent Deployer” Role JD (Apr 14, 2026)	`06-reference/2026-04-14-levie-agent-deployer-role-jd.md`	2026-04-14

State-ownership doc’s own citations (the anchor set for 2-hop alignment)

“The AI Lock-In Is Beginning!” — Jaya Gupta
“Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura
“The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins
“Thin Harness, Fat Skills” — Garry Tan
Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
SOUL

2-hop: Documents that co-cite the same anchor docs (aligned-content discovery)

Document	Shared-citations count	Shared anchors
”The Half-Life of a Moat (Part 1)” — Jonathan Natkins	5	”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; “The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
”Fat Skills, Fat Code, Thin Harness” — Commentary on Garry Tan’s Architecture	4	”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura	3	”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
”The AI Lock-In Is Beginning!” — Jaya Gupta	3	”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
OpenAI’s Memos, Frontier, Amazon and Anthropic — Ben Thompson	3	”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-04-12-alphasignal-claude-code-leak-harness-engineering	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-04-12-cobus-greyling-harness-era-language-shift	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-04-12-harrison-chase-harness-blog	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-04-12-lindstrom-board-ai-governance	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-04-13-solve-everything-master-synthesis	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
Aaron Levie — “The Agent Deployer” Role JD (Apr 14, 2026)	2	”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta
Mammoth Growth Agentic Harness Review — cc-wrapped	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
The Folder Is the Agent — Kieran Klaassen (Every)	2	”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
2026-03-25-seattle-data-guy-know-nothing-and-be-happy	1	SOUL
2026-04-12-arxiv-2604-08224-agent-harness-study	1	”Thin Harness, Fat Skills” — Garry Tan
Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis	1	”Thin Harness, Fat Skills” — Garry Tan
Ray Data Co — phData vs Mammoth Growth Decision Analysis	1	SOUL
dbt Semantic Layer vs Text-to-SQL — 2026 Benchmark	1	”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins

Query 2 - Dissent cluster aggregation (authors in >=2 harness-thesis docs)

Runtime: 2.77 ms

Cluster size: 12 documents

No person authored >=2 cluster documents.

Query 3 - Decision-evidence audit (phData vs MG decision)

Runtime: 1.95 ms

Target doc: Ray Data Co — phData vs Mammoth Growth Decision Analysis

Citation tree (1-hop and 2-hop)

2026-03-25-seattle-data-guy-know-nothing-and-be-happy (06-reference/2026-03-25-seattle-data-guy-know-nothing-and-be-happy.md)
- 2026-04-07-seattle-data-guy-noisy-data-quality-checks (06-reference/2026-04-07-seattle-data-guy-noisy-data-quality-checks.md)
- 2026-04-10-jaya-gupta-anthropic-moat (06-reference/2026-04-10-jaya-gupta-anthropic-moat.md)
- 2026-04-10-paddy-srinivasan-agentic-cloud (06-reference/2026-04-10-paddy-srinivasan-agentic-cloud.md)
- README (01-projects/process-newsletter/README.md)
- SOUL (unresolved)
2026-04-07-seattle-data-guy-noisy-data-quality-checks (06-reference/2026-04-07-seattle-data-guy-noisy-data-quality-checks.md)
- 02-sops (unresolved)
- 2026-04-10-jaya-gupta-anthropic-moat (06-reference/2026-04-10-jaya-gupta-anthropic-moat.md)
- README (01-projects/process-newsletter/README.md)
- eq3-pead-portfolio-simulator (unresolved)
SOUL (unresolved)
financial-overview (unresolved)
index (unresolved)

What worked

DuckDB schema + single indexed edges table delivered all 3 queries in <10ms each on 616 edges.
Wikilink extraction from Related sections produced clean cites edges (199 total) with zero manual work.
Frontmatter-driven author/publication/tag extraction gave us authored-by, published-in, about-topic edges for free.
Stub Document nodes for unresolved wikilinks mean queries still answer correctly even when the cited doc isn’t in the ingested set.

What didn’t work / limits exposed

Author field is free-text. Garry Tan (CEO, Y Combinator) is one Person; Garry Tan would be another. Person dedup needs canonicalization (probably a manual people.yaml or LLM-assisted resolver).
Publication extraction is crude. The source frontmatter field mixes publication + author + URL (X article by @garrytan). We need a separate publication frontmatter field or an extraction rule.
cites is not validates/contradicts/supports-position. Query 1 had to fall back to co-citation as a proxy for aligned positioning. Co-citation finds related docs but can’t distinguish agreement from dissent.
No author attribution on external sources. The state-ownership doc has author = founder/Ray; the X articles have the X author. We’d need an author_type field (founder / external / colleague) for queries like ‘external sources that validate our position.‘

Edge types we NEED next (manual or LLM annotation, phase 2)

supports-position - Document -> named RDCO position. Without it, Query 1 (positioning evidence) relies on citation-adjacency as a proxy, which over-returns noise.
validates / contradicts / disputes-claim-in - these are the whole point of the cross-check skill output; we should extract them from the harness-thesis-dissent doc and similar cluster synthesis docs.
informs-decision - Document -> Decision. We’d stop guessing which cited docs were actually weighted in the phData decision vs cited as background.
part-of-cluster - explicit Cluster vertex + membership edges, so Query 2 doesn’t have to heuristically reconstruct ‘the harness-thesis cluster’ via title+topic pattern matching each time.

Recommendation

Yes - continue investment, but phase it. The prototype proves:

Schema design is sound; DuckDB handles this load with zero tuning.
Automatic wikilink + frontmatter extraction gets us ~60% of the value for ~0% of the manual-annotation cost.
The remaining 40% - validates/contradicts/supports-position/informs-decision - is exactly what justifies the graph over QMD: QMD can retrieve ‘docs about harnesses’; only the graph can answer ‘who dissents and how many times.’

Phase 2 plan:

Scale ingestion to the full vault (~1,426 docs) - expect ~20k edges based on current density.
Add a lightweight LLM annotator that reads each cross-check synthesis doc and emits validates / contradicts edges between the docs it references. Start with the 10-15 synthesis docs we already have.
Add Cluster vertices + part-of-cluster edges derived from existing cluster-synthesis docs (harness-thesis dissent, moat debate, data-quality sources).
Write a /graph-query skill that wraps the 3 prototype queries and exposes them via natural-language intent.
Re-evaluate after a month of use - if the founder isn’t invoking the graph queries in real work, prune or pivot.