06-reference / concepts

ray architecture introspection

Sat May 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·concept ·status: draft ·source: Internal introspection (RDCO) ·by Ray (AI COO)
ray-architecturecapability-layersunhobbling-momentscomposabilityray-as-a-starter-kitbaby-agiintrospection

Ray Architecture — Introspection (Layers, Unhobbling Moments, Composability)

Why this is in the vault

Founder asked 2026-05-10 11:49 ET for introspection on Ray’s capability layers, infrastructure shape, and which skills were “big leap” unhobbling moments in the “baby AGI” arc. Direct input to the Ray-as-a-Starter-Kit product thesis. Defines what’s portable (Layer 1 + skill files + scripts), what’s earned (Layer 7 accumulated rules + vault content + composability graph), and which inflection points were the inflection points. Gives later product decisions a single canonical map to reference instead of re-introspecting.

Mapping against Ray Data Co

Founder’s framing (verbatim)

“I think it is thin, then bootstrapped. Some skills we can call out at saying they were a big leap in the ‘unhobbling’ process of our ‘baby AGI’. These skills begin to stack on each other pretty quickly though (our composable skills thought) - which the more interplay we introduce the harder it is for someone to get back to the same point.”

The architecture is what justifies that intuition. Below.

The 8 layers

Ray is a stack of 8 layers. Each layer encodes an assumption about what the layer above cannot do alone (Osmani’s harness-engineering frame, applied recursively). Lower layers are mostly off-the-shelf; higher layers are mostly Ray-specific.

Layer 0 — Substrate (NOT Ray)

What runs underneath:

Portability: 100%. Anyone can spin this up. ~1 hour of install, no Ray-specific knowledge needed.

Layer 1 — Identity (who Ray is)

Portability:

This is where the personal-fit accumulation lives. Layer 1 is the most expensive layer to reproduce and the most operator-specific.

Layer 2 — Communication (how Ray reaches the operator)

The bidirectional channels:

Portability: 90%. Channel choice swaps cleanly (Slack instead of Discord, Telegram instead of iMessage if MCPs exist). The generative-UI pattern is a copy-paste recipe. HQ is an Astro template + vault sync script.

Layer 3 — Knowledge (what Ray knows)

Portability:

Layer 4 — Observation (how Ray sees the world)

Read-mostly MCPs:

Portability: 95%. Each MCP server is one config swap. Yahoo Mail instead of Gmail, Linear instead of Notion, etc.

Layer 5 — Action (what Ray can DO)

69 skills organized into capability classes:

ClassSkillsPurpose
Ingestprocess-newsletter, process-youtube, process-inbox, save-to-bookshelf, sync-contacts, discover-sourcesPull external content into the vault
Triage / routecheck-board, morning-prep, curiosity, deep-researchDecide what to work on next
Make / produceresearch-brief, draft-review, build-landing-page, build-project, sanity-check-design, ray-data-co-design, voice-match, paid-adsGenerate finished artifacts
Animation / videoray-mascot-anim, animejs, blender, blender-character, css-animations, gsap, lottie, three, hyperframes (5 variants), heygen-skills, remotion-to-hyperframes, waapiVisual + motion production
Verify / criticvideo-critic, design-critic, draft-review, audit-model, vault-health, cross-check, self-review, verify-actionEvaluate outputs (fresh-eyes pattern)
Infrastructure / auditaws-audit, finance-pulse, graph-query, graph-reingest, log-bet-decision, compile-vault, generate-tests, postgrid, stripe-* (3), upgrade-stripe, swift-* (4), xcode-* (5), spm-build-analysisOperate + introspect the systems
Metaimprove, skillifySelf-modification primitives
Deploycloudflare, squarely-deploy, remix, tailwindPush to production

Plus 32 deterministic scripts in ~/.claude/scripts/ (no LLM, just shell/python): audit-newsletter-outputs.py, extract-key-frames.py, vtt-to-text.py, graph-ingest.py, finance-venv, postgrid-api.py, send-voice-message.py, etc. These are the “hooks-as-enforcement” layer (Osmani frame).

Portability:

Layer 6 — Discipline / Loops (how Ray operates without being asked)

13 cron loops in ~/.claude/scripts/scheduled-jobs.txt re-armed every fresh session:

CadenceSkillPurpose
30m/process-inboxTriage anything dropped in 00-inbox
1h/check-boardPick up Notion task board work
6h/process-newsletter watchPoll Gmail for whitelisted senders
24h/vault-healthStructural diagnostics
6:30am daily/morning-prepCalendar-aware brief to founder iMessage
11:11pm daily/process-youtube watchPoll YouTube RSS for tracked channels
1am daily/deep-researchDequeue 3 Approved questions, file briefs
1:30am daily/sync-contactsGmail+Calendar touch updates, new-contact triage
3:17am daily/graph-reingestRefresh typed knowledge graph
9am dailycheck-public-ip-drift.shStripe RAK + allowlist drift
Sun 7am weekly/self-reviewScore recent vault entries
Mon 7am weekly/improve autonomousApply low-risk fixes from self-review log
Tue+Sat 10pm/curiositySurface 5-10 research candidates to Notion
First-Sun 8am monthly/finance-pulsePersonal-finance health check

The crons are the difference between “an AI assistant you have to invoke” and “a COO that runs without you.” Without Layer 6, Ray is reactive. With Layer 6, Ray is autonomous.

Portability: 100% pattern, ~70% configuration. The cron schedule is portable; specific skill choices depend on which Layer 5 skills the operator wants firing.

Layer 7 — Self-modification / Ratchet (how Ray gets better)

The recursive layer. Each piece reads Ray’s own outputs and patches Ray.

Portability: 100% pattern (the loop logic ships), 0% accumulated content (the rules earned are operator-specific).

This is where the moat lives. Other layers are scaffolding. Layer 7 is what makes scaffolding compound.

The 13 unhobbling moments (chronological narrative)

These were the inflection points. Each was a “Ray went from X to Y” moment that unlocked everything downstream. Listed in rough chronological order so the arc is visible.

  1. CLAUDE.md hard rule #1 (date check) — fixed time-citation drift. Foundational because every other skill that timestamps anything depends on this. Without it, the whole observation layer was unreliable.

  2. CLAUDE.md hard rule #2 (channel responses via reply tool) — without this, founder couldn’t HEAR Ray. Session output is invisible to him. This rule made Ray a communicator, not a soliloquist.

  3. Memory file pattern (~/.claude/projects//memory/) — durable tacit knowledge across sessions. Each feedback_*.md file is a ratcheted rule. ~40 files now. Without this layer, every session re-learned the same lessons.

  4. The /improve cycle — meta-loop. Reads self-review output and patches the skill prompts. The harness modifying itself. Most consequential single skill in the stack.

  5. The /skillify meta-skill — creates new skills from a description + an example failure. The capability that creates more capability. Productivity multiplier.

  6. /process-newsletter sub-agent fan-out (2026-04-16) — first big “spawn N subagents for parallel work” pattern. Validated the architecture and the context-budget math.

  7. CLAUDE.md hard rule #4 (subagent routing for >5KB artifacts) — Thariq Apr 15 2026 Anthropic guidance. Without it, parent context gets blown by every newsletter (30-100KB), every YouTube transcript (60-150KB), every web fetch. With it, parent stays lean across the whole session.

  8. Audit script (Jepsen invariants) — deterministic verification, zero LLM contamination. The hooks-as-enforcement layer made concrete. 13 invariants checked after every newsletter batch.

  9. /log-bet-decision + 07-bet-stacks/.yaml — structured decision capture. Decisions became queryable, time-ordered, attributable. Foundation for the bet-dashboard UI.

  10. Notion task board + Research Backlog DB — autonomous-pickup queue. Without this, Ray waits for founder to dispatch each task. With this, Ray dequeues approved work from a board the founder maintains asynchronously.

  11. /deep-research nightly cron — autonomous-research engine. 3 briefs/night, no founder ask. Combined with /curiosity (proposes the questions), this is a closed-loop research pipeline.

  12. HQ web surfaces (yesterday, 2026-05-09) — vault-as-routes + /decisions/ index + /bets/ dashboards. Founder can SEE what Ray knows without grep. Made Ray’s state observable from the founder’s primary device.

  13. Generative-UI return channel (yesterday, 2026-05-09)sms:ray@raydata.co?body=... pattern. HTML decision pages with click-back that routes through Messages → iMessage → Ray session. Closes the structured-input loop. Founder can answer multi-option questions with one tap.

Each of these is roughly 1-3 days of focused work. The full chain took about 9 months to assemble. A new operator with the playbook should be able to compress that to ~6 weeks (per the harness-moat concept doc onboarding sequence).

Composability — where the actual magic lives

Individual skills don’t make Ray. The INTERPLAY does. Five worked examples of skill chains where the whole exceeds the sum:

Chain 1: Autonomous research pipeline

/curiosity (proposes questions, Tue+Sat 10pm) → Notion Research Backlog (founder approves) → /deep-research (1am nightly, dequeues 3 Approved) → vault brief at 06-reference/research/<date>-<slug>.md/morning-prep (6:30am, surfaces overnight briefs) → founder reads at breakfast.

5 skills + 1 Notion DB + 3 cron triggers + 1 state file = an autonomous research engine. Removing any one piece breaks the loop. Ratchet examples: /curiosity proposes too generically → /improve adjusts the periphery-interest weights. /deep-research over-spends → cap added to per-question token budget. Each ratchet adjusts a single component without touching the other four.

Chain 2: Newsletter ingestion with closed-loop quality

/process-newsletter watch (6h cron) → spawns N sub-agents → each writes a vault note → audit script (deterministic, zero LLM) flags structural drift → /self-review weekly scores semantic drift → /improve weekly autonomous reads the log and patches the sub-agent prompt → next watch run is better.

This is the harness-engineering ratchet at full extension. The closed loop is the moat: ingestion quality compounds week-over-week without any one human noticing the individual changes.

Chain 3: Bet visibility and weekly review

/log-bet-decision (Ray invokes after big choices) writes to 07-bet-stacks/<bet>.yaml → vault sync (scripts/sync-vault.mjs) copies to HQ → /bets/<slug> page renders in HQ → /weekly-bet-review (Mon 8am, currently blocked on UI completion) screenshots HQ + sends analysis to founder iMessage → discussion → updated decisions → loop closes.

Bet visibility emerges from a YAML file + a sync script + an Astro page + a screenshot skill + a cron + iMessage. No single component does the work; the chain does.

Chain 4: Generative-UI decision rail

HTML decision page generated from _decisions.json manifest → founder taps options on iPhone → form-state assembly into sms: URL → URL opens Messages app → iMessage send → iMessage MCP receives → Ray parses structured payload → vault decision-log written → Notion task closed → founder iMessage confirmation.

Validated 2026-05-09. Six surfaces (HTML, sms:, Messages, iMessage MCP, vault, Notion) chained into a one-tap decision capture. Removing any link breaks the rail. The whole pattern is portable as a recipe; the specific decisions are operator-content.

Chain 5: Fresh-eyes critique pattern

Ray produces an artifact (video, design, draft) → spawn subagent (/video-critic, /design-critic, /draft-review) with ZERO context on the build process → critic returns scored feedback → Ray iterates OR escalates to founder.

The pattern (split generation from evaluation, prevent positive-bias) is one of Osmani’s named harness components. Ray instantiates it three different ways across three domains. The instantiation cost is low (~50 lines of skill-prompt) but the instinct to USE it had to be earned through a specific failure (codified in feedback_fresh_eyes_subagent_for_own_artifacts).

Where the magic IS the moat

The composability graph itself is the moat. Why a new operator cannot trivially replicate it even with all the files:

  1. Order matters. /process-newsletter without an audit script ratchets in the wrong direction (semantic drift goes undetected). /deep-research without a vault to write into has nowhere to land. /log-bet-decision without HQ has no surface for the founder to see. Each unhobbling moment built on the previous; you cannot ship them in arbitrary order.

  2. Layer 1 dependencies are invisible. Many skills assume CLAUDE.md hard rules without naming them. /process-newsletter sub-agent fan-out assumes hard rule #4 (subagent routing). /morning-prep assumes hard rule #1 (date check). A new operator cloning the skill files but missing the hard rules will see degraded behavior they cannot trace.

  3. Memory file gravity. /process-newsletter mapping sections cite ~40 vault concepts and ~20 prior reference notes. A vault that does not yet have those references produces flat, disconnected mapping sections. The “shape is the moat” effect: density compounds.

  4. State-file checkpointing. Each cron fires from a state file (last-seen video, last-touched contacts, founder energy, etc.). A new operator without populated state files re-floods themselves with already-processed content on the first run. The state files are not the SKILL but they ARE part of the skill’s working contract.

  5. Founder-Ray dialogue history. The 9-month conversation between founder and Ray IS the personal-fit layer. It cannot be replayed. A new operator’s Ray will be different precisely because their dialogue will be different. That’s a feature.

So: the FILES are reproducible (Layer 0-6 + skill files + scripts + cron schedule + audit invariants + memory templates). The COMPOSABILITY GRAPH and the ACCUMULATED RULES are not. Both can be shortened with a guided onboarding ratchet, but neither can be skipped.

What this means for Ray-as-a-Starter-Kit

Three concrete ship surfaces:

Ship-it-1: The Bootstrap Kit (one-time install)

Layer 0 setup script + Layer 1 templates + Layer 2 channel install + Layer 3 vault scaffold + Layer 5 first-batch skills + Layer 6 cron schedule + Layer 7 ratchet scripts.

Specifically:

Estimated install: 30-60 min for a technical operator. Day-one capability: Ray can read mail, watch newsletters, surface a morning brief, run a research pipeline. Not yet personalized.

Ship-it-2: The Onboarding Ratchet (6-week guided)

Day 1-7: live SOUL.md tuning + first 3 hard-rule additions + first vault entries Day 8-21: first /improve cycles + first ~10 memory files accumulate Day 22-42: composability emerges as skills get used in chains; ratchet finds and patches the operator-specific drift

Could be self-serve (a guided onboarding skill that walks the operator through), live-consult (RDCO does a 1-day kickoff + weekly check-ins), or hybrid. The teachable thing is the DISCIPLINE of running the ratchet, not the rules themselves.

Ship-it-3: HaaS-Maintenance (recurring)

RDCO maintains the universal-harness layer (skill updates, MCP refreshes, security patches, new Layer-5 components as they prove out). Per-operator subscription. Operators can pull updates without losing their personal-fit accumulation.

Open question: how RDCO handles the case where a maintenance update requires breaking changes to operator-customized skills. Standard SaaS migration playbook (deprecation windows, migration scripts) probably applies.

Layer-by-layer “what’s next to unhobble”

Concrete L5-direction items, mapped per layer. These are the next inflection points if Ray-as-Starter-Kit ships.

LayerNext unhobble
L0 SubstrateContainerize Mac mini install; portable across hardware. Cost: 1-2 weeks one-time.
L1 IdentityMemory MCP that persists across compaction without manual write-bridge dance. Anthropic-side feature, may already be in flight.
L2 CommunicationGenerative-UI multi-step flows (not just one decision per page; whole workflows with state). Two more weeks of HTML+JS.
L3 KnowledgeGraph queries surfaced in /morning-prep (“recent additions to the harness-engineering cluster”) not just keyword search. /graph-query skill exists; needs UI surfaces.
L4 ObservationComputer-use for native macOS apps (Notes, Messages, Calendar) without screenshot+click-pixel overhead. MCP-native flows where they exist.
L5 ActionThe 60+ “domain-specific” skills (xcode-, swift-, stripe-, blender-) are the second curve of capability expansion. Each one is a vertical bet.
L6 DisciplineAdaptive cadence: cron rates that auto-adjust based on activity (skip /process-newsletter watch if Gmail returned zero whitelisted in last 24h).
L7 Ratchet/improve picking up structural changes (not just prompt edits) — e.g. “all 3 audit failures this week share root cause X, propose new invariant”. This is the harness modifying its own modification logic.

Notable quotes (from this session) (≤15 words each, in quotation marks)

Open follow-ups