06-reference

karpathy llm wiki idea file

Fri Apr 03 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·idea-file ·source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f ·by Andrej Karpathy

LLM Wiki — Karpathy’s Idea File

This is the formalized version of Karpathy’s earlier tweet thread about LLM knowledge bases. He published it as an “idea file” — a new format designed for the agent era where you share the idea instead of the code, and the recipient’s agent builds a custom implementation.

The Mental Model: Compiled Knowledge vs. Retrieved Knowledge

The key distinction: RAG rediscovers knowledge from scratch on every query. A wiki compiles it once and keeps it current. Most systems (NotebookLM, ChatGPT file uploads) retrieve raw chunks at query time. The LLM Wiki pattern instead has the agent incrementally build and maintain a persistent, interlinked collection of markdown files — updating entity pages, revising summaries, flagging contradictions, strengthening synthesis. The wiki is a persistent, compounding artifact.

This is exactly what we built with the Obsidian vault + QMD stack. The vault is our compiled wiki. QMD is our search layer. The content intake SOP is our ingest workflow. The compounding knowledge concept article names the pattern.

Architecture: Three Layers

  1. Raw sources — immutable input documents. The LLM reads but never modifies. Our equivalent: the Readwise import, iMessage/Discord articles, Notion extractions. We process these into the vault but don’t modify the originals.

  2. The wiki — LLM-generated markdown files with summaries, entity pages, concept pages, cross-references. The LLM owns this layer entirely. Human reads, LLM writes. Our equivalent: everything in 06-reference/ and 06-reference/concepts/. The vault compiler (~/.claude/skills/compile-vault/SKILL.md) maintains cross-links and generates concept articles.

  3. The schema — configuration telling the LLM how the wiki is structured, what conventions to follow. Our equivalent: SOUL.md, content intake SOP, and the skill definitions. These co-evolve as we figure out what works.

Operations: Ingest, Query, Lint

Ingest — drop a source, LLM processes it, updates the wiki. A single source might touch 10-15 pages. Karpathy prefers one-at-a-time with human involvement. We do this via the content intake SOP — articles arrive via iMessage/Discord/inbox, get compiled with wikilinks, filed to the right location.

Query — ask questions against the wiki. Key insight: good answers should be filed back into the wiki as new pages. Explorations compound just like ingested sources. This is the gap we identified in the earlier analysis — we’re partially missing the “file outputs back” step. Research answers that live only in chat context get lost at the 4am restart.

Lint — health check the wiki periodically. Find contradictions, stale claims, orphan pages, missing concepts, data gaps. This is exactly our /vault-health skill running daily at 7am plus the /compile-vault skill for fixes.

Index and Log

Karpathy uses two special files:

He notes that index-based navigation works “surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.” We’re at 477 docs and QMD handles it fine — validating his observation that you don’t need heavy RAG at this scale.

The “Idea File” as a Format

The meta-insight: in the agent era, you share ideas rather than implementations. “Your agent customizes & builds it for your specific needs.” This is the skills-as-building-blocks pattern applied to knowledge transfer — share the skill definition, not the output.

How We Compare

Karpathy’s PatternOur ImplementationGap?
Raw sources (immutable)Readwise, iMessage, Discord, Notion
LLM-maintained wikiObsidian vault, compile-vault skill
Schema/configSOUL.md, SOPs, skill definitions
Ingest workflow/process-inbox, content intake SOP
Query with filing backPartially — research stays in chat⚠️ Need to file more outputs back
Lint/health check/vault-health daily + /compile-vault
Index navigationREADME.md per folder + QMD
Chronological logDiscord #ops reports, no vault log⚠️ Consider adding log.md
CLI search toolsQMD (BM25 + vector + re-ranking)✅ Better than his suggestion
Obsidian as IDESame
Git version historyNot yet — vault isn’t a git repo⚠️ Consider adding

Open Questions