“md or html?” — @the_smart_ape

Why this is in the vault

Founder shared 2026-05-09 ~10:24 ET as a side-quest while we were mid-build on the /vault/ KB-as-routes feature. Same week as Thariq’s “Unreasonable Effectiveness of HTML” piece (filed 2026-05-08). Where Thariq argued HTML wins for human-facing artifacts, this piece pushes back with a more nuanced framework: format choice is a function of audience + lifecycle + horizon, and the load-bearing insight is the hybrid pattern (one md canonical → many html derivatives), not the binary.

Validates RDCO’s emerging architecture (vault = md canonical, /decisions and /vault/ HQ routes = html derivatives) and gives us a 3-question test to apply to future artifact-format choices.

Author note

@the_smart_ape is a pseudonym account. Modest reach (41 likes, 21k impressions, 9 RTs at fetch). No verifiable institutional affiliation. The framework reads as legitimate working-developer experience, not vendor pitch — internal consistency is high. Treat as one well-articulated practitioner perspective, not authoritative spec.

The framework

Every doc has three properties. Each votes md or html. When all three align, format picks itself. When they split, use the hybrid pattern.

Question 1 — Who actually reads this?

If Claude re-ingests it in a future session, html eats the context window:

800-word doc with 6 sections + code blocks: ~1100 tokens as md, ~3200 tokens as html (~3x cost)
Multiplied across 30 reference files in a project: 60k tokens burned on markup
Retrieval pipelines (Claude Projects, Cursor index, Continue, LangChain loaders) chunk html worse — 15-25% relevance degradation on the same content

Rule: doc is reference Claude will read again → md, no exceptions. One-time human deliverable → html on the table.

Question 2 — How many times will this get edited?

Claude does not edit html the way it edits md:

Modify a sentence in md: ~5 lines of diff
Modify the same sentence in styled html: 40-100 lines of diff because Claude rewrites surrounding markup “to clean it up”
Class names shift, spacing tokens change, SVG coordinates re-emit
Over 5-10 iterations: 3 spacing systems and 4 color schemes living in the same file. Nothing visibly broke. Everything is broken.

Author calls this markup drift.

Rule: edited >2-3 times → md or hybrid. HTML is a publication format, not an iteration format.

Question 3 — How long does this thing live?

Two tests:

Grep test. In 6 months, can you grep -r "pricing" and find this doc? In md, yes. In html, you’re searching for <h2>pricing</h2> and missing <h2 class="section-title" data-anchor="pricing">pricing</h2>. Doc exists, can’t find it.

Survivability test. Open the html file in 3 years. Does the CDN link to Tailwind still work? Did Lucide change its API? Did Chart.js bump majors? A md file from 2015 still opens cleanly. An html file from 2015 with 3 CDN imports does not.

Rule: ephemeral / demonstrative / one-shot → html. Retrieved / indexed / outlives author → md.

The hybrid pattern (load-bearing)

The move best teams use: write the canonical doc in md. Generate html views on demand for specific audiences.

Example: one architecture.md becomes:

Exec view: one page, top-level, no jargon
Engineering view: full doc with interactive SVG diagrams
Onboarding view: same content + inline quizzes + progress tracker

Each html is a derived artifact. Regenerate when source changes. Md stays reviewable, indexable, greppable. Html does presentation lift.

Setup is “brutally simple: a 10-line script that pipes your md into Claude with three different system prompts. No infrastructure, no lock-in, no commitment to a format.”

Author’s claim: “teams that get this never talk about it because it sounds too obvious. it isn’t. most teams duplicate content across formats and watch them drift apart in 3 weeks.”

The 30-second reversibility test

Before committing to html, ask: if I had to convert this back to clean markdown right now, could I do it in one prompt?

If no → content is mixed with markup. That’s the warning sign. Content should always be cleanly extractable from the format. The day you can’t separate them is the day the document becomes unmaintainable.

Author claims this catches 90% of bad format choices in 30 seconds.

Three legitimate exceptions to default-md

Sales demo or external one-pager where the aesthetic IS the content
True one-shot where you literally never touch the file again
Interactive prototype where the interaction IS the deliverable

Mapping against Ray Data Co

Strong validation for RDCO’s emerging architecture. Founder articulated his read as “vault is md, ephemeral decisions are html” — directionally right, but misses the hybrid pattern as the third bucket.

What we got right (per the framework)

Vault = md canonical. Clear win on all 3 questions: Claude re-ingests it, edits happen many times across sessions, lives forever.
/visuals artifacts = html. One-shot deliverables for human reading. Pass the reversibility test (article verdicts and decision recaps could trivially convert back to md).
/decisions parameterized canary (today’s GEO page) = html. Interactive prototype where interaction IS the deliverable. Pass exception #3.

What we already do that matches the hybrid pattern

/decisions/.html ←→ vault decision log at 01-projects//-decision-log.md. The html is the click-back surface; the vault md is the durable record of what was decided and why. Ship-loop pattern matches the article’s hybrid.
/vault/ HQ routes ←→ ~/rdco-vault/. HQ shows the md source rendered as html via Astro. Source of truth stays canonical, HQ render is derived at build time. Sync script (sync-vault.mjs) is the “10-line script that pipes md into different presentations” the article describes (just done at build time, not on demand).

Where the article tightens our forward design

When we build /bets// dashboards (the A in C→A queued for next):

Dashboards are html applications, that’s fine. Aesthetic + interaction matter. Astro typed components prevent the markup-drift Claude does on hand-edited html.
BUT every piece of content shown in them must have md canonical underneath. KPIs from /api/projects, decision summaries from _decisions.json, vault notes via /vault/ — content lives in md, dashboard pulls and renders. Don’t trap content inside the dashboard markup.
Apply the reversibility test before any new artifact format choice. If we ever can’t extract content from form, that’s the warning sign.

The Thariq tension (resolved)

Thariq’s “Unreasonable Effectiveness of HTML” (filed 2026-05-08) argued HTML wins for founder-facing artifacts because it gets read; >100-line markdown doesn’t. The Smart Ape says HTML breaks edit cycles + retrieval + grep + survival. Both are right, and the hybrid pattern resolves the tension:

Thariq’s read failure-mode happens for human-facing one-shot artifacts (build reports, design verdicts, decision docs) — agreed, html wins for those, AND we have it.
The Smart Ape’s drift / context / retrieval failure-mode happens for Claude-re-ingested reference material (skill files, vault knowledge base, memories) — agreed, md wins, AND we have it.
The artifacts where both apply (long-lived AND benefits from html presentation) → hybrid pattern. RDCO’s /decisions and /vault/ already work this way.

Both writers are arguing one slice of the same truth.

Notable quotes (≤15 words each, in quotation marks)

“format is a tool. it should be invisible. when you’re arguing about it, you’re losing.”
“doc is reference material claude will read again → md, no exceptions”
“html is a publication format, not an iteration format”
“teams that get this never talk about it because it sounds too obvious”
“the source of truth stays text”

Open follow-ups

Apply the reversibility test as a hard gate when designing new HQ artifact surfaces (especially when /bets// build starts).
Codify the hybrid pattern as an RDCO design principle: every long-lived html surface on hq.raydata.co MUST have md canonical underneath that the html renders from. No exceptions for production surfaces.
Worth a Sanity Check piece eventually? “The hybrid pattern” is something the founder + I figured out instinctively without naming it; the article gives us the name and the test. Could be a clean Sanity Check practitioner-journey post.

06-reference/2026-05-08-thariq-unreasonable-effectiveness-html — Thariq’s HTML-as-output piece, complementary view (resolved tension above)
06-reference/2026-05-08-dan-farrelly-background-agents-orchestration — Dan Farrelly on harness-not-framework, same week, different layer
06-reference/2026-04-15-thariq-claude-code-session-management-1m-context — Thariq’s context-rot guidance, supports the “html eats context window” argument
01-projects/geo-unblock-and-test/2026-05-09-decision-log — example of the hybrid pattern in action (md canonical + /decisions html derivative)
HQ /vault/ surface — example of build-time hybrid (md canonical → static html via sync script + Astro)
HyperFrames skills — the html-as-output thesis at the video composition layer

Source caveat

Article body retrieved via xmcp getPostsById with tweet.fields: ["article", ...] + expansions: ["article.cover_media", "article.media_entities"] — same fetch path validated for Thariq’s piece on 2026-05-08 and Dan Farrelly’s piece same week. The article.plain_text field returns full body. Author is anonymous; treat individual claims (3x token cost, 15-25% retrieval degradation) as illustrative figures from his own work, not validated benchmarks.