04-tooling

verify action v2 flash review extension

Mon May 04 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·tooling ·status: proposal
verify-actionhooksquality-reviewflash-reviewfat-cat-council-adjacentslop-cannon-prevention

verify-action v2 - flash-review extension proposal

Founder asked 2026-05-05 20:05 ET: “Is there something more meaningful we should add to that hook? Like a quality check / review? The hook is a bit like the flash board meeting we were discussing with the ‘fat cat’ agents. Kind of. Same vein.”

He’s right - verify-action v1 (mechanical regex on em-dashes + chat_id formats) is the thinnest version of a much bigger pattern. This is the v2 design.

The architectural insight

verify-action and the 2026-05-01-fat-cat-council-architecture-proposal sit on the same axis: agents reviewing other agents’ output before the action ships. Three scales of review:

ScaleTriggerLatencyCostToday
Tier 1 - Mechanical hookEvery reply tool call<100ms$0verify-action v1 (R001-R004 active)
Tier 2 - Flash reviewHigher-stakes outbound (length, scope, asset-class)<10s<$0.01/fireNOT BUILT (this proposal)
Tier 3 - Council meetingStrategic decisions, on-demand30-90s~$2.50/meetingDesigned (Fat Cat Council), not built

Each tier appropriate to its scope. Tier 2 is the missing middle - the “flash board meeting” the founder named.

v2 design - two-tier hook

Tier 1 (mechanical, unchanged from v1)

Already shipped, awaiting hook activation. Pure Python, regex-mechanical, no LLM, fires on every iMessage / Discord reply tool call. Rules R001-R004:

Tier 2 (semantic flash-review, NEW)

Fires when triggers cross a higher-stakes threshold. Spawns 1-3 narrow reviewer agents in parallel. Each returns {verdict: PASS|REVISE, confidence: 0..1, redlines: [string]}. Action proceeds only if quorum is PASS (2/3 default). On REVISE, the consolidated redline list is surfaced back to Ray for revision before send.

Triggers (any of these conditions fires Tier 2):

  1. Outbound channel reply length >800 chars (current R004 soft-warn becomes a flash-review trigger instead)
  2. Vault write to project folder (~/rdco-vault/01-projects/<bet>/) - public-facing-adjacent
  3. Vault write to a published-content folder (~/rdco-vault/06-reference/concepts/, ~/rdco-vault/06-reference/research/, anywhere that informs newsletter or course content)
  4. Sub-agent dispatch prompts for build-class work (any prompt with “build”, “implement”, “deploy”, “ship”) - verify the Ray-side prompt is well-formed BEFORE fan-out
  5. First reply in a new thread OR after >4hr silence (high-context-loss risk)
  6. Outbound to non-founder user (Discord channel reply where inbound user_id != founder)
  7. Founder explicitly tagged the action as production-mode (per the ../../.claude/projects/-Users-ray/memory/feedback_ic_vs_production_mode rule)

Reviewer specialists (each narrow, each <5s, each <$0.005/fire):

Default Tier 2 panel = 3 reviewers chosen by trigger:

Tier 3 (Fat Cat Council - separate)

Already designed. Stays on-demand, multi-provider, strategic-decision scope. Doesn’t fire automatically; Ray or founder calls a meeting. See 2026-05-01-fat-cat-council-architecture-proposal.

Cost framing

Tier 1: $0/month (pure regex, current verify-action.py).

Tier 2: assume 20-30 fires/day average across all triggers. 3 reviewers per fire. Each reviewer is a tiny model call (Haiku-class, ~500 input tokens, ~100 output tokens) at ~$0.0005-0.001 per call. Total: 60-90 reviewer calls/day × $0.0007 avg = ~$0.05/day = ~$1.50/month.

If we want richer reviews (more context per call, longer reasoning, occasional Sonnet-class for hard cases), cap at $5-10/month and call it acceptable.

Tier 3 (Fat Cat Council): ~$15-20/month at 4 meetings/month per the existing proposal.

All tiers combined: ~$20-30/month total for the full review architecture. Cheap insurance against the slop-cannon failure mode.

Build sequencing

Phase 1 (immediate, ~3hr build) - Activate Tier 1 + ship the confidentiality reviewer

Phase 2 (~6-8hr build) - Tier 2 hook architecture

Phase 3 (~4hr build) - Reviewer specialist library

Phase 4 (~2hr) - Trigger tuning + observability

Total build cost: ~15-17hr across 4 phases. Phase 1 alone (~3hr) materially raises the quality floor; Phase 2 unlocks the flash-board-meeting pattern the founder named.

Why this matters strategically

Per ../../.claude/projects/-Users-ray/memory/feedback_ic_vs_production_mode (just filed): RDCO’s reputational moat depends on production quality being visibly multi-step + multi-specialist + critique-gated. The verify-action v2 hook IS the always-on enforcement of that moat at the action layer.

This is the operationalization of “no slop cannon” at the lowest cost-per-action tier. Tier 3 (Fat Cat Council) is the strategic-decision version. Tier 2 (this proposal) is the per-action version. Tier 1 is the regex floor.

Three tiers, one philosophy: agents review agents before the founder sees the output. The mode of review scales with the stakes of the action.

Open questions for founder

  1. Phase 1 activation: greenlight the existing verify-action v1 hook into ~/.claude/settings.json AND extend Phase 1 to include the Confidentiality reviewer (R005) as a deterministic add? Or stage them separately?
  2. Phase 2 trigger tuning: 7 trigger conditions named above. Any to drop, any to add, any thresholds to set differently?
  3. Reviewer panel size: default 3 per fire (quorum 2/3). Cheaper at 2-of-2 (any REVISE blocks). Slower / richer at 5-of-5 with majority quorum. Lean: 3-of-3 for now.
  4. Where the cost ceiling lives: $5/mo? $10/mo? Hard cap that drops to Haiku-only when exceeded?
  5. Phase ordering: ship Phase 1 immediately (3hr) and queue Phases 2-4 as Notion tasks for next week? Or treat as a single multi-evening project and ship as a connected v2 release?

Cross-references