LLM Wiki — Karpathy’s Idea File

This is the formalized version of Karpathy’s earlier tweet thread about LLM knowledge bases. He published it as an “idea file” — a new format designed for the agent era where you share the idea instead of the code, and the recipient’s agent builds a custom implementation.

The Mental Model: Compiled Knowledge vs. Retrieved Knowledge

The key distinction: RAG rediscovers knowledge from scratch on every query. A wiki compiles it once and keeps it current. Most systems (NotebookLM, ChatGPT file uploads) retrieve raw chunks at query time. The LLM Wiki pattern instead has the agent incrementally build and maintain a persistent, interlinked collection of markdown files — updating entity pages, revising summaries, flagging contradictions, strengthening synthesis. The wiki is a persistent, compounding artifact.

This is exactly what we built with the Obsidian vault + QMD stack. The vault is our compiled wiki. QMD is our search layer. The content intake SOP is our ingest workflow. The compounding knowledge concept article names the pattern.

Architecture: Three Layers

Raw sources — immutable input documents. The LLM reads but never modifies. Our equivalent: the Readwise import, iMessage/Discord articles, Notion extractions. We process these into the vault but don’t modify the originals.
The wiki — LLM-generated markdown files with summaries, entity pages, concept pages, cross-references. The LLM owns this layer entirely. Human reads, LLM writes. Our equivalent: everything in 06-reference/ and 06-reference/concepts/. The vault compiler (~/.claude/skills/compile-vault/SKILL.md) maintains cross-links and generates concept articles.
The schema — configuration telling the LLM how the wiki is structured, what conventions to follow. Our equivalent: SOUL.md, content intake SOP, and the skill definitions. These co-evolve as we figure out what works.

Operations: Ingest, Query, Lint

Ingest — drop a source, LLM processes it, updates the wiki. A single source might touch 10-15 pages. Karpathy prefers one-at-a-time with human involvement. We do this via the content intake SOP — articles arrive via iMessage/Discord/inbox, get compiled with wikilinks, filed to the right location.

Query — ask questions against the wiki. Key insight: good answers should be filed back into the wiki as new pages. Explorations compound just like ingested sources. This is the gap we identified in the earlier analysis — we’re partially missing the “file outputs back” step. Research answers that live only in chat context get lost at the 4am restart.

Lint — health check the wiki periodically. Find contradictions, stale claims, orphan pages, missing concepts, data gaps. This is exactly our /vault-health skill running daily at 7am plus the /compile-vault skill for fixes.

Index and Log

Karpathy uses two special files:

index.md — content-oriented catalog. We have these as README.md in each folder, plus the QMD search index.
log.md — chronological append-only record. We don’t have this explicitly — our equivalent is the Notion task board history and Discord #ops reports. Worth considering adding a vault changelog.

He notes that index-based navigation works “surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.” We’re at 477 docs and QMD handles it fine — validating his observation that you don’t need heavy RAG at this scale.

The “Idea File” as a Format

The meta-insight: in the agent era, you share ideas rather than implementations. “Your agent customizes & builds it for your specific needs.” This is the skills-as-building-blocks pattern applied to knowledge transfer — share the skill definition, not the output.

How We Compare

Karpathy’s Pattern	Our Implementation	Gap?
Raw sources (immutable)	Readwise, iMessage, Discord, Notion	✅
LLM-maintained wiki	Obsidian vault, compile-vault skill	✅
Schema/config	SOUL.md, SOPs, skill definitions	✅
Ingest workflow	/process-inbox, content intake SOP	✅
Query with filing back	Partially — research stays in chat	⚠️ Need to file more outputs back
Lint/health check	/vault-health daily + /compile-vault	✅
Index navigation	README.md per folder + QMD	✅
Chronological log	Discord #ops reports, no vault log	⚠️ Consider adding log.md
CLI search tools	QMD (BM25 + vector + re-ranking)	✅ Better than his suggestion
Obsidian as IDE	Same	✅
Git version history	Not yet — vault isn’t a git repo	⚠️ Consider adding

Open Questions

Should we add a log.md to the vault root as an append-only changelog? It would give us a timeline of the vault’s evolution and help with the “resume from summary” pattern after restarts.
Should we put the vault under git version control? Karpathy suggests it for free version history. The archive + Obsidian Sync might be sufficient, but git would add diffing and branching.
How do we systematically file research outputs back into the wiki? The compound engineering loop requires it, but we don’t have an automatic trigger.