LLM Knowledge Bases — Andrej Karpathy

The mental model: the compounding knowledge loop. Raw data goes in, an LLM compiles it into a structured wiki, every query against the wiki generates output that gets filed back in, and the whole thing gets better with use. The knowledge base isn’t a static artifact — it’s a flywheel where exploration and maintenance are the same activity.

The Architecture

Karpathy describes a pipeline that is deceptively simple:

Ingest — source documents (papers, articles, repos, datasets, images) land in a raw/ directory. Obsidian Web Clipper handles web-to-markdown conversion; a hotkey pulls images local so the LLM can reference them.
Compile — an LLM incrementally builds a wiki from raw sources: summaries, backlinks, concept articles, cross-references. The wiki is just .md files in a directory structure. At his current scale (~100 articles, ~400K words), the LLM auto-maintains index files and brief document summaries well enough that fancy RAG isn’t needed.
Query — ask complex questions against the wiki. The LLM researches answers across the compiled knowledge. Outputs render as markdown, Marp slideshows, or matplotlib images — all viewable in Obsidian.
File back — query outputs get filed into the wiki, enriching it for future queries. Every exploration compounds.
Lint — LLM health checks find inconsistent data, impute missing information via web search, surface interesting connections for new articles. The LLM suggests further questions to ask.
Tooling — custom tools (e.g., a vibe-coded search engine with web UI and CLI) extend what the LLM can do against the data.

The IDE is Obsidian throughout — viewing raw data, the compiled wiki, and derived visualizations in one place.

The Follow-Up

Karpathy extrapolates: every question to a frontier LLM could spawn a team of LLMs that automatically constructs an ephemeral wiki, lints it, loops a few times, then writes a full report. “Way beyond a .decode().” This is the agent-team version of the same loop — temporary knowledge bases spun up per query, not just persistent ones.

What We Already Do

We’re running a version of this. The Obsidian vault indexed by QMD is our compiled knowledge base — SOPs, project docs, decisions, reference material, all searchable via hybrid BM25 + vector + LLM re-ranking. The content intake SOP is our ingest pipeline: media comes in through channels, gets processed into structured markdown with wikilinks and frontmatter, gets embedded into QMD, and the vault grows. Every intake doc like this one is a step in the compounding loop.

The “intelligence lives in the system” pattern from Block’s hierarchy-to-intelligence model is the same structural argument. Block puts their world model in the system so the intelligence layer can compose solutions from it. We put our operational knowledge in the vault so I can execute against it. Karpathy puts his research in a wiki so his LLM can answer questions against it. Same pattern at three different scales.

Where We’re Behind

LLM-compiled structure. Our wiki is manually organized. We write the docs, maintain the links, decide on the directory structure. Karpathy’s key move is letting the LLM do the compilation — summaries, cross-references, concept articles, index maintenance. We could do this with a skill that periodically recompiles sections of the vault: regenerates indexes, surfaces broken links, finds orphaned docs, suggests new cross-references. The skills-as-building-blocks model is designed exactly for this kind of incremental automation.

Filing query outputs back in. When I research something for the founder, the output usually goes to iMessage or Discord — it doesn’t always make it back into the vault. The compounding loop breaks when outputs evaporate into chat. We should be more disciplined about routing useful outputs into 06-reference/ or project docs.

Linting. We don’t run health checks on the vault. Karpathy’s LLM linting — finding inconsistencies, imputing missing data, suggesting new articles — is a natural fit for a scheduled skill. Run it weekly, surface a short report of what needs attention, file fixes automatically or flag them for review.

Custom tooling over the wiki. QMD gives us search, but Karpathy is building additional tools (search engine, CLI tools) that extend what the LLM can do. As the vault grows, we’ll want more ways to slice the data — timeline views, dependency graphs across projects, trend analysis across reference docs.

Open Questions

At what vault size does QMD’s approach stop being sufficient and we need Karpathy-style auto-maintained indexes? We’re at 22 docs — his threshold was ~100 articles. We have runway, but the architecture decision matters before we hit scale.
Should we build a “vault compiler” skill that periodically regenerates summaries and cross-references? This is the highest-leverage piece of the pattern we’re missing.
The ephemeral wiki idea — spinning up a temporary knowledge base per complex query — is interesting for project research. When evaluating a new bet, an agent team could build a mini-wiki from web research, lint it, then deliver a structured report. Worth prototyping.
How do we measure whether the compounding loop is actually working? Karpathy’s implicit metric is query quality over time. What’s ours — decision speed, fewer repeated questions, richer cross-references?