01-projects / process-newsletter

readme

Fri Apr 10 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·project ·status: prototype-complete

Process Newsletter — project README

Goal

Ingest email newsletters from a whitelisted set of senders into the vault as structured assessment notes, with bias and sponsor flagging. Separate from Substack-specific tooling — this works on any newsletter that lands in ben@raydata.co regardless of publisher (Substack, Ghost, Mailchimp, self-hosted).

Companion skill: ~/.claude/skills/process-newsletter/SKILL.md.

Status

Whitelist — locked in 2026-04-11

K = keep, backfill history + ongoing watch

SenderNewsletterTypical formatNotes
email@stratechery.comStratechery (Ben Thompson)thought-leadership60 actual in inbox (2026-04-12 discovery; 3 batch tasks created). Original “201+ in history” estimate was archive-based, NOT inbox-bounded.
seattledataguy@substack.comSeattleDataGuy (Ben Rogojan)hybrid✅ backfill complete (8 articles + 1 pre-existing)
notboring@substack.comNot Boring (Packy McCormick)hybrid (long-form + Friday optimism curation)28 actual in inbox (2026-04-30 discovery; 1 batch task created, 26 already filed). Original “201+ in history” estimate was archive-based.
practicaldatamodeling@substack.comPractical Data Modeling (Joe Reis)thought-leadership, series-based32 actual in inbox (2026-04-30 discovery; 1 new batch task created, 23 already filed). Original “~20+” was closer but still off. ⚠️ paid sub lapsed 2026-04-24 — flagged to founder for resubscribe decision.
analyticsengineeringroundup@substack.comAnalytics Engineering Roundupcuration~20+ in history (estimate — not yet discovery-scanned; assume lower per inbox-bounded pattern below)
hello@every.toEverymulti-author thought-leadership201+ in history (estimate — not yet discovery-scanned; assume MUCH lower per inbox-bounded pattern below)
hello@ship30for30.comShip30for30 (Start Writing Online)writing craft / marketing30+/180d
writewithai@substack.comWrite With AIwriting with AI tools8+/180d
michaeldean9@substack.comEssay Architecture (Michael Dean)essay writing craftcuration-heavy
ark@arkinvest.comARK Invest (Cathie Wood)investment commentaryweekly stock commentary
newsletter@commoncog.comCommoncog (Cedric Chin)thought-leadership, series-based~201+ in history estimate — actual is inbox-bounded (founder re-subscribed 2026-04-15, so inbox has from-that-date forward). Operator’s field manual; tacit knowledge, expertise, sensemaking. Highly relevant to RDCO agent-deployer positioning.

⚠️ Count-before-budget rule (added 2026-04-30)

The original “N+ in history” estimates in the K table were Substack-archive-based, NOT Gmail-inbox-bounded. Real inbox counts have come back substantially smaller (Stratechery: 60 vs 201+ estimate; PDM: 32 vs 20+ estimate; Not Boring: 28 vs 201+ estimate). The gap is because Gmail only has from-subscription-date forward, not the sender’s full archive.

Rule for remaining un-scanned senders (Every, Commoncog, Analytics Engineering Roundup, ARK Invest, Write With AI, Ship30for30, Essay Architecture): run discovery with --dry-run semantics first — count messages in inbox BEFORE planning batch sizes. If count is small (<20), skip the batch-task overhead and process inline via Mode 3 (Backfill, legacy small-sender path). Don’t allocate 10+ batch tasks for a sender that only has 25 messages in the inbox.

Implication: the total backfill work is significantly smaller than the README originally implied. Prioritize the per-sender deep-fetch quality (sponsor detection, RDCO mapping discipline) over volume planning.

F = follow-forward only, no backfill (watch from now onward)

SenderNewsletterTypical format
theinnermostloop@substack.comInnermost Loop (Alex Wissner-Gross)thought-leadership
dataengineeringcentral@substack.comData Engineering Centralthought-leadership
dataengineeringweekly@substack.comData Engineering Weekly (Ananth Packkildurai)curation
technically@substack.comTechnicallythought-leadership
news@alphasignal.aiAlphaSignalcuration (AI/ML news)
lon@dataelixir.comData Elixircuration (data science/ML news) — founder will resubscribe to ben@raydata.co; currently hits personal inbox
semistructured@substack.comSemi-Structured (Jonathan Natkins)thought-leadership (data infrastructure for AI agents) — added 2026-04-12

Known sender-specific gotchas (learned from SDG backfill)

SDG — SeattleDataGuy (fully backfilled)

PDM — Practical Data Modeling (Joe Reis) — partial backfill, paid sub lapsed 2026-04-24

Not Boring (Packy McCormick) — partial backfill, ongoing watch

Patterns to watch for in other senders (not yet backfilled)

Skill design — key decisions

  1. Skill over cron for now. The /process-newsletter watch invocation is user-triggered, not scheduled, until we have confidence it doesn’t burn context or generate noise.
  2. Always-flag, never-filter. Sponsors and bias get disclosed in frontmatter and body, never used to skip an article. The reader needs to see the angle.
  3. Deep-fetch cautiously. Max 2 link follows per curation issue, only for third-party links clearly relevant to RDCO topics. No paywall traversal.
  4. Vault-path discipline. All notes under 06-reference/<YYYY-MM-DD>-<sender-slug>-<topic-slug>.md. No exceptions. The filename convention is what lets the skill detect “already-filed” without duplicating work.
  5. Tracked authors feed Task #4. When a guest post or curation link surfaces a new author worth following (Dylan Anderson from the Mar 25 SDG issue, Olga Berezovsky from Jan 23), they go to the CRM candidate list, not directly into a contact file.

Lessons from the SDG prototype

What worked well:

What to improve for the next sender:

Next actions (pending founder approval)

  1. Backfill the next K sender — recommend Stratechery (201+ messages, highest expected signal density, single author for consistent assessment voice).
  2. Set up the watch cron loop once we’ve done 2-3 senders and trust the pattern.
  3. Revisit the 06-reference/ folder structure question at ~50 files.
  4. Handle Data Elixir once founder re-subscribes.