verify-action v2 - flash-review extension proposal
Founder asked 2026-05-05 20:05 ET: “Is there something more meaningful we should add to that hook? Like a quality check / review? The hook is a bit like the flash board meeting we were discussing with the ‘fat cat’ agents. Kind of. Same vein.”
He’s right - verify-action v1 (mechanical regex on em-dashes + chat_id formats) is the thinnest version of a much bigger pattern. This is the v2 design.
The architectural insight
verify-action and the 2026-05-01-fat-cat-council-architecture-proposal sit on the same axis: agents reviewing other agents’ output before the action ships. Three scales of review:
| Scale | Trigger | Latency | Cost | Today |
|---|---|---|---|---|
| Tier 1 - Mechanical hook | Every reply tool call | <100ms | $0 | verify-action v1 (R001-R004 active) |
| Tier 2 - Flash review | Higher-stakes outbound (length, scope, asset-class) | <10s | <$0.01/fire | NOT BUILT (this proposal) |
| Tier 3 - Council meeting | Strategic decisions, on-demand | 30-90s | ~$2.50/meeting | Designed (Fat Cat Council), not built |
Each tier appropriate to its scope. Tier 2 is the missing middle - the “flash board meeting” the founder named.
v2 design - two-tier hook
Tier 1 (mechanical, unchanged from v1)
Already shipped, awaiting hook activation. Pure Python, regex-mechanical, no LLM, fires on every iMessage / Discord reply tool call. Rules R001-R004:
- R001: em dash (U+2014) blocked
- R002: iMessage chat_id format
^any;-;\+\d+$enforced - R003: Discord external @mention soft-warn
- R004: iMessage length >800 chars soft-warn
Tier 2 (semantic flash-review, NEW)
Fires when triggers cross a higher-stakes threshold. Spawns 1-3 narrow reviewer agents in parallel. Each returns {verdict: PASS|REVISE, confidence: 0..1, redlines: [string]}. Action proceeds only if quorum is PASS (2/3 default). On REVISE, the consolidated redline list is surfaced back to Ray for revision before send.
Triggers (any of these conditions fires Tier 2):
- Outbound channel reply length >800 chars (current R004 soft-warn becomes a flash-review trigger instead)
- Vault write to project folder (
~/rdco-vault/01-projects/<bet>/) - public-facing-adjacent - Vault write to a published-content folder (
~/rdco-vault/06-reference/concepts/,~/rdco-vault/06-reference/research/, anywhere that informs newsletter or course content) - Sub-agent dispatch prompts for build-class work (any prompt with “build”, “implement”, “deploy”, “ship”) - verify the Ray-side prompt is well-formed BEFORE fan-out
- First reply in a new thread OR after >4hr silence (high-context-loss risk)
- Outbound to non-founder user (Discord channel reply where inbound user_id != founder)
- Founder explicitly tagged the action as production-mode (per the ../../.claude/projects/-Users-ray/memory/feedback_ic_vs_production_mode rule)
Reviewer specialists (each narrow, each <5s, each <$0.005/fire):
- Voice-match reviewer - uses the existing
voice-matchskill primitives. Scores outbound against the founder’s established voice (Sanity Check archive + 30+ feedback memories). Catches voice drift, AI-tells beyond em-dashes. - Advisor-mode reviewer - checks the response leads with the decision-needed framing per ../../.claude/projects/-Users-ray/memory/feedback_advisor_not_pair_programmer. Catches buried-lede shape and “narrating routine work” anti-patterns.
- Confidentiality reviewer - mechanical grep against MG client strings + PII patterns + 1Password-credential-shape detection. Deterministic, near-zero cost (could even sit in Tier 1 once specified).
- Brief-iMessage reviewer - checks the brief-iMessage discipline per ../../.claude/projects/-Users-ray/memory/feedback_brief_imessage_link_to_hq. Long iMessage → recommend split or vault-link.
- Sharp-verdict reviewer - when founder shared a link without comment, did the response lead with skip/skim/read/file per ../../.claude/projects/-Users-ray/memory/feedback_sharp_verdicts_on_shared_content.
- Targeting-filter reviewer - on new-capability questions, did the response apply the four-layer filter per ../../.claude/projects/-Users-ray/memory/feedback_targeting_system_prioritization_filter.
Default Tier 2 panel = 3 reviewers chosen by trigger:
- Channel reply >800 chars: voice-match + advisor-mode + brief-iMessage
- Vault write to project/content folder: confidentiality + voice-match + advisor-mode
- Sub-agent dispatch prompt: confidentiality + advisor-mode + (custom: spec-adherence reviewer that checks the prompt against the relevant SOP)
- Outbound to non-founder Discord user: confidentiality + voice-match + (custom: external-tone reviewer)
- Shared-link response: sharp-verdict + advisor-mode + voice-match
Tier 3 (Fat Cat Council - separate)
Already designed. Stays on-demand, multi-provider, strategic-decision scope. Doesn’t fire automatically; Ray or founder calls a meeting. See 2026-05-01-fat-cat-council-architecture-proposal.
Cost framing
Tier 1: $0/month (pure regex, current verify-action.py).
Tier 2: assume 20-30 fires/day average across all triggers. 3 reviewers per fire. Each reviewer is a tiny model call (Haiku-class, ~500 input tokens, ~100 output tokens) at ~$0.0005-0.001 per call. Total: 60-90 reviewer calls/day × $0.0007 avg = ~$0.05/day = ~$1.50/month.
If we want richer reviews (more context per call, longer reasoning, occasional Sonnet-class for hard cases), cap at $5-10/month and call it acceptable.
Tier 3 (Fat Cat Council): ~$15-20/month at 4 meetings/month per the existing proposal.
All tiers combined: ~$20-30/month total for the full review architecture. Cheap insurance against the slop-cannon failure mode.
Build sequencing
Phase 1 (immediate, ~3hr build) - Activate Tier 1 + ship the confidentiality reviewer
- Founder approves the verify-action v1 hook activation in
~/.claude/settings.json(already pending per the verify-action skill build). - Add the Confidentiality reviewer to v1 (deterministic, mechanical - same shape as R001-R004 but extends the rule corpus to include MG strings + PII patterns).
- Rules become R001-R005, hook still fires synchronously, no Tier 2 yet.
Phase 2 (~6-8hr build) - Tier 2 hook architecture
- New script
~/.claude/scripts/flash-review.pythat takes a trigger context + a panel-of-reviewers list, dispatches reviewers in parallel, consolidates verdicts. - Each reviewer is a tiny LLM call (Haiku) with a narrow prompt + the relevant feedback-memory context loaded.
- Hook in
~/.claude/settings.jsonextends to fire flash-review on the trigger conditions above. - Output schema: PASS / REVISE-WITH-REDLINES / BLOCK (rare; reserved for hard-confidentiality violations).
Phase 3 (~4hr build) - Reviewer specialist library
- Each of the 6 reviewer specialists gets its own scaffolding under
~/.claude/skills/flash-review/reviewers/<name>.mdwith prompt + rule corpus + test fixtures. - Voice-match reviewer composes the existing
voice-matchskill primitives. - All reviewers tested against this session’s known drift events (em-dash misses, buried-lede shapes, MG-confidentiality near-misses in the early MAC anchor article, etc.).
Phase 4 (~2hr) - Trigger tuning + observability
- Log every flash-review fire to
~/.claude/state/flash-review-log.jsonlfor audit. - Surface a weekly digest: how many fires, how many PASSes vs REVISEs, which reviewers fired most, which redline patterns recurred.
- Tune triggers based on real-data signal-to-noise.
Total build cost: ~15-17hr across 4 phases. Phase 1 alone (~3hr) materially raises the quality floor; Phase 2 unlocks the flash-board-meeting pattern the founder named.
Why this matters strategically
Per ../../.claude/projects/-Users-ray/memory/feedback_ic_vs_production_mode (just filed): RDCO’s reputational moat depends on production quality being visibly multi-step + multi-specialist + critique-gated. The verify-action v2 hook IS the always-on enforcement of that moat at the action layer.
This is the operationalization of “no slop cannon” at the lowest cost-per-action tier. Tier 3 (Fat Cat Council) is the strategic-decision version. Tier 2 (this proposal) is the per-action version. Tier 1 is the regex floor.
Three tiers, one philosophy: agents review agents before the founder sees the output. The mode of review scales with the stakes of the action.
Open questions for founder
- Phase 1 activation: greenlight the existing verify-action v1 hook into
~/.claude/settings.jsonAND extend Phase 1 to include the Confidentiality reviewer (R005) as a deterministic add? Or stage them separately? - Phase 2 trigger tuning: 7 trigger conditions named above. Any to drop, any to add, any thresholds to set differently?
- Reviewer panel size: default 3 per fire (quorum 2/3). Cheaper at 2-of-2 (any REVISE blocks). Slower / richer at 5-of-5 with majority quorum. Lean: 3-of-3 for now.
- Where the cost ceiling lives: $5/mo? $10/mo? Hard cap that drops to Haiku-only when exceeded?
- Phase ordering: ship Phase 1 immediately (3hr) and queue Phases 2-4 as Notion tasks for next week? Or treat as a single multi-evening project and ship as a connected v2 release?
Cross-references
- 2026-05-01-fat-cat-council-architecture-proposal - Tier 3 of this same review architecture
- ../02-sops/2026-05-05-rdco-website-polish-workflow - the production-mode workflow this hook enforces
- ../../.claude/skills/verify-action/SKILL.md - v1 implementation
- ../../.claude/projects/-Users-ray/memory/feedback_ic_vs_production_mode - the underlying operating principle
- ../../.claude/projects/-Users-ray/memory/feedback_no_em_dashes - the canonical R001 rule + audit-narration suppression