The File System Is the New Database: How I Built a Personal OS for AI Agents
Muratcan Koylan (@koylanai) is Context Engineer at Sully.ai, where he designs context engineering systems for healthcare AI. His open-source work on context engineering has 8,000+ GitHub stars and is cited in academic research alongside Anthropic. This is his full architecture writeup for “Personal Brain OS” — a file-based personal operating system that lives in a Git repository and gives AI assistants persistent, modular context without a database, API keys, or build step.
The core reframe: context engineering, not prompt engineering. Prompt engineering asks “how do I phrase this question better?” Context engineering asks “what information does this AI need to make the right decision, and how do I structure that information so the model actually uses it?”
The Core Problem: Context, Not Prompts
The bottleneck with AI assistants isn’t phrasing — it’s that every conversation starts from zero. You re-explain who you are, what you’re working on, your style guide, your goals. Then 40 minutes in, the model forgets your voice and starts writing like a press release.
Two architectural principles fix this:
The Attention Budget. Language models have a finite context window, and not all of it is created equal. Dumping everything into a system prompt degrades performance — every token competes for the model’s attention. Models have a measurable U-shaped attention curve where the middle blurs. This means you design for progressive loading, not bulk injection.
Progressive Disclosure. Instead of one massive system prompt, the system uses three levels:
- A lightweight routing file always loaded — tells the AI which module is relevant
- Module-specific instructions — load only when that module is needed
- Actual data (JSONL logs, YAML configs, research docs) — load only when the task requires it
Koylan’s routing file is SKILL.md. Module instruction files (CONTENT.md, OPERATIONS.MD, NETWORK.md) are 40-100 lines each. Data files load last. Maximum of two hops to any piece of information.
The Instruction Hierarchy
Three scoped layers eliminate the “conflicting instructions” problem that plagues large AI projects:
CLAUDE.md(repository level): onboarding document — every AI tool reads it first, gets the full mapAGENT.md(brain level): seven core rules and a decision table mapping common requests to exact action sequences- Module-level files (domain level): domain-specific behavioral constraints, scoped so rules can’t contradict each other
AGENT.md is a decision table. “User says ‘send email to Z’” maps to: Step 1, look up contact in HubSpot. Step 2, verify email address. Step 3, send via Gmail. Module files define priority levels (P0: do today, P1: this week, P2: this month, P3: backlog) so the agent triages tasks consistently. The agent follows the same priority system the founder uses because the system is codified, not implied.
The File System as Memory
No database. No vector store. No retrieval system beyond Cursor/Claude Code’s native features. Just files on disk, versioned with Git. Every format was chosen for a specific reason:
JSONL for logs — append-only by design, stream-friendly (agent reads line by line without parsing the whole file), every line is self-contained valid JSON. An agent can only add lines. Deletion is done by marking "status": "archived" — preserving full history. This is non-negotiable: Koylan lost three months of post engagement data early on when an agent rewrote posts.jsonl instead of appending to it.
YAML for configuration — handles hierarchical data cleanly, supports comments (context the agent reads but that doesn’t pollute the data structure), readable by both humans and machines.
Markdown for narrative — LLMs read it natively, renders everywhere, produces clean diffs in Git.
The system has 11 JSONL files (posts, contacts, interactions, bookmarks, ideas, metrics, experiences, decisions, failures, engagement, meetings), 6 YAML files (goals, values, learning, circles, rhythms, heuristics), and 50+ Markdown files. Every JSONL file starts with a schema line so the agent always knows the structure before reading data.
Episodic Memory is the key differentiator from a standard “second brain.” The memory/ module contains three append-only logs: experiences.jsonl (key moments with emotional weight scores 1-10), decisions.jsonl (key decisions with reasoning, alternatives considered, and outcomes tracked), and failures.jsonl (what went wrong, root cause, prevention steps). The difference between an AI that has your files and an AI that has your judgment. Facts tell the agent what happened. Episodic memory tells the agent what mattered, what you’d do differently, and how you think about tradeoffs.
Cross-Module References create a flat-file relational model. contact_id in interactions.jsonl points to entries in contacts.jsonl. pillar in ideas.jsonl maps to content pillars in identity/brand.md. The modules are isolated for loading but connected for reasoning. “Prepare for my meeting with Sarah” triggers: find Sarah in contacts, pull her interactions, check pending todos, compile brief. Three files, no manual assembly.
The Skill System
Files store knowledge. Skills encode process. Two types of skills solve two different problems:
Reference skills (voice-guide, writing-anti-patterns) — set user-invocable: false in YAML frontmatter. The agent reads the description and injects them automatically whenever the task involves writing. Never invoked manually; they activate silently every time. Solves the consistency problem.
Task skills (/write-blog, /topic-research, /content-workflow) — set disable-model-invocation: true. The agent can’t trigger them on its own. Manual invocation makes them the agent’s complete instruction set for that task. Solves the precision problem.
When Koylan types /write-blog context engineering for marketing teams, five things happen automatically: voice guide loads, anti-patterns load, blog template loads, persona folder is checked, research folder is checked. One slash command triggers a full context assembly. The skill file references source modules — never duplicates content. Single source of truth.
The Voice System encodes voice as structured data, not adjectives. Five attributes rated 1-10: Formal/Casual (6), Serious/Playful (4), Technical/Simple (7), Reserved/Expressive (6), Humble/Confident (7). The anti-patterns file has 50+ banned words in three tiers, banned openings, structural traps (forced rule of three, copula avoidance, excessive hedging), and a hard limit of one em-dash per paragraph. “Professional but approachable” is useless for an AI. A 7 on the Technical/Simple scale tells the model exactly where to land.
Templates as Structured Scaffolds define 7-section blog structures with word count targets, 11-post thread structures, 4-phase research templates. The research template outputs to knowledge/research/[topic].md with an Evidence Bank (statistics, quotes, case studies, papers — each cited with source and date). The output of one skill becomes the input of the next.
Daily Operation
Content Pipeline: Seven stages — Idea → Research → Outline → Draft → Edit → Publish → Promote. Ideas scored 1-5 across five dimensions; proceed if total hits 15+. Batched Sunday content creation (3-4 hours, 3-4 posts drafted). The publication log feeds the promotion skill.
Personal CRM: Contacts in four circles with different maintenance cadences (inner: weekly, active: bi-weekly, network: monthly, dormant: quarterly). Each contact has can_help_with and you_can_help_with fields enabling intro matching. A stale_contacts script cross-references contacts + interactions + circles to surface outreach needs — a 30-second weekly scan.
Automation Chains: Five scripts handle recurring workflows. Sunday weekly review chains three scripts: metrics snapshot → stale contacts flag → weekly review document (completed vs. planned, metrics trends, next week’s priorities). Scripts output to stdout in agent-readable format. The weekly review isn’t a report — it’s the starting point for next week’s planning.
What He Got Wrong
Over-engineered schemas. Initial JSONL schemas had 15+ fields per entry; most were empty. Agents struggle with sparse data — they fill in fields or comment on absence. Cut to 8-10 essential fields.
Voice guide was too long. Version one was 1,200 lines. The agent drifted by paragraph four as voice instructions fell into the “lost in the middle” zone. Restructured to front-load distinctive patterns in the first 100 lines. Critical rules at the top, not the middle.
Module boundaries matter more than expected. Initially combined identity and brand in one module. The agent loaded the entire bio when it only needed the banned words list. Splitting cut token usage for voice-only tasks by 40%. Every module boundary is a loading decision.
Append-only is non-negotiable. Lost three months of data when an agent overwrote a JSONL file. The append-only pattern isn’t a convention — it’s a safety mechanism. The agent can add data; it cannot destroy data.
Connections to Our Setup
This is the most architecturally rigorous implementation of patterns the RDCO vault already uses — but with several ideas worth incorporating:
| Koylan’s system | RDCO equivalent |
|---|---|
SKILL.md routing file | QMD semantic search + skill trigger matching |
| Module-level instruction files | Project-level CLAUDE.md files |
| YAML frontmatter on every file | Our frontmatter schema (type, date, source) |
| Episodic memory (decisions.jsonl, failures.jsonl) | Partially: decisions.md in vault; not systematic |
| Append-only JSONL logs | Not yet — most of our data is markdown, not JSONL |
| Anti-patterns file for voice | Not yet — voice is in SOUL.md but less structured |
Ideas to steal directly:
- Episodic memory logs —
decisions.jsonlandfailures.jsonlwith structured fields are immediately buildable; the compounding value accumulates over time - Numeric voice attributes — converting SOUL.md’s voice description into scored dimensions gives AI precise calibration targets rather than adjectives
- Skill cross-referencing pattern — skill files should reference source modules, not duplicate content. We may have some duplication in our skills ecosystem worth auditing.
- 40-token schema line at the top of every JSONL file so agents always know structure before reading data
Connections
- 06-reference/2026-01-19-arscontexta-vault-agent-series — parallel architecture; arscontexta does this for Obsidian vaults and vault-building; Koylan does it for personal productivity and content creation. Both arrive at progressive disclosure + modular context + wikilink traversal.
- 06-reference/2026-04-07-claude-code-architecture-teardown — the SKILL.md routing file, module instruction files, and progressive disclosure layers are exactly the architecture described in the Claude Code teardown (skills, hooks, CLAUDE.md hierarchy)
- 06-reference/2026-04-04-talking-to-agents-is-all-you-need — “context is the new moat” — Koylan’s system is a concrete implementation of this thesis at the personal level
- 06-reference/concepts/skills-as-building-blocks — auto-loading vs. manual invocation is a clean articulation of the composable skill pattern; reference skills vs. task skills map to what we call ambient context vs. explicit skills
- 06-reference/2026-04-04-planning-with-files-skill — planning with files as a skill pattern — Koylan’s system is essentially this philosophy applied holistically to an entire personal OS
- SOUL.md — the voice system (numeric attributes + anti-patterns + banned words) is a more rigorous encoding of what SOUL.md does for this operating model; worth considering a SOUL.md v2 that adds numeric voice dimensions