Graph DB Prototype - Results
Ingestion summary
- Source files ingested: 50 (from
ingested-files.txt) - Total vertices: 394
- Topic: 190
- Document: 106
- Person: 65
- Publication: 33
- Total edges: 616
- about-topic: 285
- cites: 199
- authored-by: 86
- published-in: 46
Note: Document vertex count (106) exceeds 50 because wikilinks in Related sections create stub Document nodes for cited-but-not-ingested files. These stubs have no outbound edges of their own but let 1-hop citation queries succeed.
Query 1 - Positioning evidence (state-ownership architecture)
Runtime: 5.48 ms
Target doc: rdco-state-ownership-architecture
1-hop: Documents that cite the state-ownership architecture doc
| Document | Path | Date |
|---|---|---|
| OpenAI’s Memos, Frontier, Amazon and Anthropic — Ben Thompson | 06-reference/2026-04-14-stratechery-openai-memos-anthropic.md | 2026-04-14 |
| ”The Half-Life of a Moat (Part 1)” — Jonathan Natkins | 06-reference/2026-04-14-semistructured-half-life-of-a-moat-part-1.md | 2026-04-14 |
| ”Fat Skills, Fat Code, Thin Harness” — Commentary on Garry Tan’s Architecture | 06-reference/2026-04-14-tan-fat-skills-fat-code-thin-harness-commentary.md | 2026-04-14 |
| Aaron Levie — “The Agent Deployer” Role JD (Apr 14, 2026) | 06-reference/2026-04-14-levie-agent-deployer-role-jd.md | 2026-04-14 |
State-ownership doc’s own citations (the anchor set for 2-hop alignment)
- “The AI Lock-In Is Beginning!” — Jaya Gupta
- “Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura
- “The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins
- “Thin Harness, Fat Skills” — Garry Tan
- Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis
- SOUL
2-hop: Documents that co-cite the same anchor docs (aligned-content discovery)
| Document | Shared-citations count | Shared anchors |
|---|---|---|
| ”The Half-Life of a Moat (Part 1)” — Jonathan Natkins | 5 | ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; “The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| ”Fat Skills, Fat Code, Thin Harness” — Commentary on Garry Tan’s Architecture | 4 | ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura | 3 | ”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; “Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| ”The AI Lock-In Is Beginning!” — Jaya Gupta | 3 | ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| OpenAI’s Memos, Frontier, Amazon and Anthropic — Ben Thompson | 3 | ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| ”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-04-12-alphasignal-claude-code-leak-harness-engineering | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-04-12-cobus-greyling-harness-era-language-shift | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-04-12-harrison-chase-harness-blog | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-04-12-lindstrom-board-ai-governance | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-04-13-solve-everything-master-synthesis | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| Aaron Levie — “The Agent Deployer” Role JD (Apr 14, 2026) | 2 | ”Agent Harnesses Are Dead. Long Live Agent Harnesses.” — João Moura; “The AI Lock-In Is Beginning!” — Jaya Gupta |
| Mammoth Growth Agentic Harness Review — cc-wrapped | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| The Folder Is the Agent — Kieran Klaassen (Every) | 2 | ”Thin Harness, Fat Skills” — Garry Tan; Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis |
| 2026-03-25-seattle-data-guy-know-nothing-and-be-happy | 1 | SOUL |
| 2026-04-12-arxiv-2604-08224-agent-harness-study | 1 | ”Thin Harness, Fat Skills” — Garry Tan |
| Dissenting Opinions on the “Thin Harness, Fat Skills” Thesis | 1 | ”Thin Harness, Fat Skills” — Garry Tan |
| Ray Data Co — phData vs Mammoth Growth Decision Analysis | 1 | SOUL |
| dbt Semantic Layer vs Text-to-SQL — 2026 Benchmark | 1 | ”The LLMs Get the Publicity. The Data Layer Does the Work” — Jonathan Natkins |
Query 2 - Dissent cluster aggregation (authors in >=2 harness-thesis docs)
Runtime: 2.77 ms
Cluster size: 12 documents
No person authored >=2 cluster documents.
Query 3 - Decision-evidence audit (phData vs MG decision)
Runtime: 1.95 ms
Target doc: Ray Data Co — phData vs Mammoth Growth Decision Analysis
Citation tree (1-hop and 2-hop)
- 2026-03-25-seattle-data-guy-know-nothing-and-be-happy (
06-reference/2026-03-25-seattle-data-guy-know-nothing-and-be-happy.md)- 2026-04-07-seattle-data-guy-noisy-data-quality-checks (
06-reference/2026-04-07-seattle-data-guy-noisy-data-quality-checks.md) - 2026-04-10-jaya-gupta-anthropic-moat (
06-reference/2026-04-10-jaya-gupta-anthropic-moat.md) - 2026-04-10-paddy-srinivasan-agentic-cloud (
06-reference/2026-04-10-paddy-srinivasan-agentic-cloud.md) - README (
01-projects/process-newsletter/README.md) - SOUL (
unresolved)
- 2026-04-07-seattle-data-guy-noisy-data-quality-checks (
- 2026-04-07-seattle-data-guy-noisy-data-quality-checks (
06-reference/2026-04-07-seattle-data-guy-noisy-data-quality-checks.md)- 02-sops (
unresolved) - 2026-04-10-jaya-gupta-anthropic-moat (
06-reference/2026-04-10-jaya-gupta-anthropic-moat.md) - README (
01-projects/process-newsletter/README.md) - eq3-pead-portfolio-simulator (
unresolved)
- 02-sops (
- SOUL (
unresolved) - financial-overview (
unresolved) - index (
unresolved)
What worked
- DuckDB schema + single indexed edges table delivered all 3 queries in <10ms each on 616 edges.
- Wikilink extraction from Related sections produced clean
citesedges (199 total) with zero manual work. - Frontmatter-driven author/publication/tag extraction gave us
authored-by,published-in,about-topicedges for free. - Stub Document nodes for unresolved wikilinks mean queries still answer correctly even when the cited doc isn’t in the ingested set.
What didn’t work / limits exposed
- Author field is free-text.
Garry Tan (CEO, Y Combinator)is one Person;Garry Tanwould be another. Person dedup needs canonicalization (probably a manual people.yaml or LLM-assisted resolver). - Publication extraction is crude. The
sourcefrontmatter field mixes publication + author + URL (X article by @garrytan). We need a separatepublicationfrontmatter field or an extraction rule. citesis notvalidates/contradicts/supports-position. Query 1 had to fall back to co-citation as a proxy for aligned positioning. Co-citation finds related docs but can’t distinguish agreement from dissent.- No author attribution on external sources. The state-ownership doc has author = founder/Ray; the X articles have the X author. We’d need an
author_typefield (founder / external / colleague) for queries like ‘external sources that validate our position.‘
Edge types we NEED next (manual or LLM annotation, phase 2)
supports-position- Document -> named RDCO position. Without it, Query 1 (positioning evidence) relies on citation-adjacency as a proxy, which over-returns noise.validates/contradicts/disputes-claim-in- these are the whole point of the cross-check skill output; we should extract them from the harness-thesis-dissent doc and similar cluster synthesis docs.informs-decision- Document -> Decision. We’d stop guessing which cited docs were actually weighted in the phData decision vs cited as background.part-of-cluster- explicit Cluster vertex + membership edges, so Query 2 doesn’t have to heuristically reconstruct ‘the harness-thesis cluster’ via title+topic pattern matching each time.
Recommendation
Yes - continue investment, but phase it. The prototype proves:
- Schema design is sound; DuckDB handles this load with zero tuning.
- Automatic wikilink + frontmatter extraction gets us ~60% of the value for ~0% of the manual-annotation cost.
- The remaining 40% - validates/contradicts/supports-position/informs-decision - is exactly what justifies the graph over QMD: QMD can retrieve ‘docs about harnesses’; only the graph can answer ‘who dissents and how many times.’
Phase 2 plan:
- Scale ingestion to the full vault (~1,426 docs) - expect ~20k edges based on current density.
- Add a lightweight LLM annotator that reads each cross-check synthesis doc and emits
validates/contradictsedges between the docs it references. Start with the 10-15 synthesis docs we already have. - Add Cluster vertices +
part-of-clusteredges derived from existing cluster-synthesis docs (harness-thesis dissent, moat debate, data-quality sources). - Write a
/graph-queryskill that wraps the 3 prototype queries and exposes them via natural-language intent. - Re-evaluate after a month of use - if the founder isn’t invoking the graph queries in real work, prune or pivot.