LLM Wiki — Karpathy’s Idea File
This is the formalized version of Karpathy’s earlier tweet thread about LLM knowledge bases. He published it as an “idea file” — a new format designed for the agent era where you share the idea instead of the code, and the recipient’s agent builds a custom implementation.
The Mental Model: Compiled Knowledge vs. Retrieved Knowledge
The key distinction: RAG rediscovers knowledge from scratch on every query. A wiki compiles it once and keeps it current. Most systems (NotebookLM, ChatGPT file uploads) retrieve raw chunks at query time. The LLM Wiki pattern instead has the agent incrementally build and maintain a persistent, interlinked collection of markdown files — updating entity pages, revising summaries, flagging contradictions, strengthening synthesis. The wiki is a persistent, compounding artifact.
This is exactly what we built with the Obsidian vault + QMD stack. The vault is our compiled wiki. QMD is our search layer. The content intake SOP is our ingest workflow. The compounding knowledge concept article names the pattern.
Architecture: Three Layers
-
Raw sources — immutable input documents. The LLM reads but never modifies. Our equivalent: the Readwise import, iMessage/Discord articles, Notion extractions. We process these into the vault but don’t modify the originals.
-
The wiki — LLM-generated markdown files with summaries, entity pages, concept pages, cross-references. The LLM owns this layer entirely. Human reads, LLM writes. Our equivalent: everything in
06-reference/and06-reference/concepts/. The vault compiler (~/.claude/skills/compile-vault/SKILL.md) maintains cross-links and generates concept articles. -
The schema — configuration telling the LLM how the wiki is structured, what conventions to follow. Our equivalent: SOUL.md, content intake SOP, and the skill definitions. These co-evolve as we figure out what works.
Operations: Ingest, Query, Lint
Ingest — drop a source, LLM processes it, updates the wiki. A single source might touch 10-15 pages. Karpathy prefers one-at-a-time with human involvement. We do this via the content intake SOP — articles arrive via iMessage/Discord/inbox, get compiled with wikilinks, filed to the right location.
Query — ask questions against the wiki. Key insight: good answers should be filed back into the wiki as new pages. Explorations compound just like ingested sources. This is the gap we identified in the earlier analysis — we’re partially missing the “file outputs back” step. Research answers that live only in chat context get lost at the 4am restart.
Lint — health check the wiki periodically. Find contradictions, stale claims, orphan pages, missing concepts, data gaps. This is exactly our /vault-health skill running daily at 7am plus the /compile-vault skill for fixes.
Index and Log
Karpathy uses two special files:
- index.md — content-oriented catalog. We have these as README.md in each folder, plus the QMD search index.
- log.md — chronological append-only record. We don’t have this explicitly — our equivalent is the Notion task board history and Discord #ops reports. Worth considering adding a vault changelog.
He notes that index-based navigation works “surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.” We’re at 477 docs and QMD handles it fine — validating his observation that you don’t need heavy RAG at this scale.
The “Idea File” as a Format
The meta-insight: in the agent era, you share ideas rather than implementations. “Your agent customizes & builds it for your specific needs.” This is the skills-as-building-blocks pattern applied to knowledge transfer — share the skill definition, not the output.
How We Compare
| Karpathy’s Pattern | Our Implementation | Gap? |
|---|---|---|
| Raw sources (immutable) | Readwise, iMessage, Discord, Notion | ✅ |
| LLM-maintained wiki | Obsidian vault, compile-vault skill | ✅ |
| Schema/config | SOUL.md, SOPs, skill definitions | ✅ |
| Ingest workflow | /process-inbox, content intake SOP | ✅ |
| Query with filing back | Partially — research stays in chat | ⚠️ Need to file more outputs back |
| Lint/health check | /vault-health daily + /compile-vault | ✅ |
| Index navigation | README.md per folder + QMD | ✅ |
| Chronological log | Discord #ops reports, no vault log | ⚠️ Consider adding log.md |
| CLI search tools | QMD (BM25 + vector + re-ranking) | ✅ Better than his suggestion |
| Obsidian as IDE | Same | ✅ |
| Git version history | Not yet — vault isn’t a git repo | ⚠️ Consider adding |
Open Questions
- Should we add a
log.mdto the vault root as an append-only changelog? It would give us a timeline of the vault’s evolution and help with the “resume from summary” pattern after restarts. - Should we put the vault under git version control? Karpathy suggests it for free version history. The archive + Obsidian Sync might be sufficient, but git would add diffing and branching.
- How do we systematically file research outputs back into the wiki? The compound engineering loop requires it, but we don’t have an automatic trigger.