06-reference

koylanai personal brain os

Fri Feb 20 2026 19:00:00 GMT-0500 (Eastern Standard Time) ·article ·source: x.com/@koylanai ·by Muratcan Koylan
context-engineeringpersonal-osfile-systemai-agentsskill-systemmemoryprogressive-disclosureJSONLclaude-code

The File System Is the New Database: How I Built a Personal OS for AI Agents

Muratcan Koylan (@koylanai) is Context Engineer at Sully.ai, where he designs context engineering systems for healthcare AI. His open-source work on context engineering has 8,000+ GitHub stars and is cited in academic research alongside Anthropic. This is his full architecture writeup for “Personal Brain OS” — a file-based personal operating system that lives in a Git repository and gives AI assistants persistent, modular context without a database, API keys, or build step.

The core reframe: context engineering, not prompt engineering. Prompt engineering asks “how do I phrase this question better?” Context engineering asks “what information does this AI need to make the right decision, and how do I structure that information so the model actually uses it?”


The Core Problem: Context, Not Prompts

The bottleneck with AI assistants isn’t phrasing — it’s that every conversation starts from zero. You re-explain who you are, what you’re working on, your style guide, your goals. Then 40 minutes in, the model forgets your voice and starts writing like a press release.

Two architectural principles fix this:

The Attention Budget. Language models have a finite context window, and not all of it is created equal. Dumping everything into a system prompt degrades performance — every token competes for the model’s attention. Models have a measurable U-shaped attention curve where the middle blurs. This means you design for progressive loading, not bulk injection.

Progressive Disclosure. Instead of one massive system prompt, the system uses three levels:

  1. A lightweight routing file always loaded — tells the AI which module is relevant
  2. Module-specific instructions — load only when that module is needed
  3. Actual data (JSONL logs, YAML configs, research docs) — load only when the task requires it

Koylan’s routing file is SKILL.md. Module instruction files (CONTENT.md, OPERATIONS.MD, NETWORK.md) are 40-100 lines each. Data files load last. Maximum of two hops to any piece of information.


The Instruction Hierarchy

Three scoped layers eliminate the “conflicting instructions” problem that plagues large AI projects:

AGENT.md is a decision table. “User says ‘send email to Z’” maps to: Step 1, look up contact in HubSpot. Step 2, verify email address. Step 3, send via Gmail. Module files define priority levels (P0: do today, P1: this week, P2: this month, P3: backlog) so the agent triages tasks consistently. The agent follows the same priority system the founder uses because the system is codified, not implied.


The File System as Memory

No database. No vector store. No retrieval system beyond Cursor/Claude Code’s native features. Just files on disk, versioned with Git. Every format was chosen for a specific reason:

JSONL for logs — append-only by design, stream-friendly (agent reads line by line without parsing the whole file), every line is self-contained valid JSON. An agent can only add lines. Deletion is done by marking "status": "archived" — preserving full history. This is non-negotiable: Koylan lost three months of post engagement data early on when an agent rewrote posts.jsonl instead of appending to it.

YAML for configuration — handles hierarchical data cleanly, supports comments (context the agent reads but that doesn’t pollute the data structure), readable by both humans and machines.

Markdown for narrative — LLMs read it natively, renders everywhere, produces clean diffs in Git.

The system has 11 JSONL files (posts, contacts, interactions, bookmarks, ideas, metrics, experiences, decisions, failures, engagement, meetings), 6 YAML files (goals, values, learning, circles, rhythms, heuristics), and 50+ Markdown files. Every JSONL file starts with a schema line so the agent always knows the structure before reading data.

Episodic Memory is the key differentiator from a standard “second brain.” The memory/ module contains three append-only logs: experiences.jsonl (key moments with emotional weight scores 1-10), decisions.jsonl (key decisions with reasoning, alternatives considered, and outcomes tracked), and failures.jsonl (what went wrong, root cause, prevention steps). The difference between an AI that has your files and an AI that has your judgment. Facts tell the agent what happened. Episodic memory tells the agent what mattered, what you’d do differently, and how you think about tradeoffs.

Cross-Module References create a flat-file relational model. contact_id in interactions.jsonl points to entries in contacts.jsonl. pillar in ideas.jsonl maps to content pillars in identity/brand.md. The modules are isolated for loading but connected for reasoning. “Prepare for my meeting with Sarah” triggers: find Sarah in contacts, pull her interactions, check pending todos, compile brief. Three files, no manual assembly.


The Skill System

Files store knowledge. Skills encode process. Two types of skills solve two different problems:

Reference skills (voice-guide, writing-anti-patterns) — set user-invocable: false in YAML frontmatter. The agent reads the description and injects them automatically whenever the task involves writing. Never invoked manually; they activate silently every time. Solves the consistency problem.

Task skills (/write-blog, /topic-research, /content-workflow) — set disable-model-invocation: true. The agent can’t trigger them on its own. Manual invocation makes them the agent’s complete instruction set for that task. Solves the precision problem.

When Koylan types /write-blog context engineering for marketing teams, five things happen automatically: voice guide loads, anti-patterns load, blog template loads, persona folder is checked, research folder is checked. One slash command triggers a full context assembly. The skill file references source modules — never duplicates content. Single source of truth.

The Voice System encodes voice as structured data, not adjectives. Five attributes rated 1-10: Formal/Casual (6), Serious/Playful (4), Technical/Simple (7), Reserved/Expressive (6), Humble/Confident (7). The anti-patterns file has 50+ banned words in three tiers, banned openings, structural traps (forced rule of three, copula avoidance, excessive hedging), and a hard limit of one em-dash per paragraph. “Professional but approachable” is useless for an AI. A 7 on the Technical/Simple scale tells the model exactly where to land.

Templates as Structured Scaffolds define 7-section blog structures with word count targets, 11-post thread structures, 4-phase research templates. The research template outputs to knowledge/research/[topic].md with an Evidence Bank (statistics, quotes, case studies, papers — each cited with source and date). The output of one skill becomes the input of the next.


Daily Operation

Content Pipeline: Seven stages — Idea → Research → Outline → Draft → Edit → Publish → Promote. Ideas scored 1-5 across five dimensions; proceed if total hits 15+. Batched Sunday content creation (3-4 hours, 3-4 posts drafted). The publication log feeds the promotion skill.

Personal CRM: Contacts in four circles with different maintenance cadences (inner: weekly, active: bi-weekly, network: monthly, dormant: quarterly). Each contact has can_help_with and you_can_help_with fields enabling intro matching. A stale_contacts script cross-references contacts + interactions + circles to surface outreach needs — a 30-second weekly scan.

Automation Chains: Five scripts handle recurring workflows. Sunday weekly review chains three scripts: metrics snapshot → stale contacts flag → weekly review document (completed vs. planned, metrics trends, next week’s priorities). Scripts output to stdout in agent-readable format. The weekly review isn’t a report — it’s the starting point for next week’s planning.


What He Got Wrong

Over-engineered schemas. Initial JSONL schemas had 15+ fields per entry; most were empty. Agents struggle with sparse data — they fill in fields or comment on absence. Cut to 8-10 essential fields.

Voice guide was too long. Version one was 1,200 lines. The agent drifted by paragraph four as voice instructions fell into the “lost in the middle” zone. Restructured to front-load distinctive patterns in the first 100 lines. Critical rules at the top, not the middle.

Module boundaries matter more than expected. Initially combined identity and brand in one module. The agent loaded the entire bio when it only needed the banned words list. Splitting cut token usage for voice-only tasks by 40%. Every module boundary is a loading decision.

Append-only is non-negotiable. Lost three months of data when an agent overwrote a JSONL file. The append-only pattern isn’t a convention — it’s a safety mechanism. The agent can add data; it cannot destroy data.


Connections to Our Setup

This is the most architecturally rigorous implementation of patterns the RDCO vault already uses — but with several ideas worth incorporating:

Koylan’s systemRDCO equivalent
SKILL.md routing fileQMD semantic search + skill trigger matching
Module-level instruction filesProject-level CLAUDE.md files
YAML frontmatter on every fileOur frontmatter schema (type, date, source)
Episodic memory (decisions.jsonl, failures.jsonl)Partially: decisions.md in vault; not systematic
Append-only JSONL logsNot yet — most of our data is markdown, not JSONL
Anti-patterns file for voiceNot yet — voice is in SOUL.md but less structured

Ideas to steal directly:


Connections