06-reference

bookshelf discovery source material inventory

Wed Apr 29 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: filesystem + vault citation scan ·by Ray (discovery agent)

Bookshelf discovery — what’s already on disk + what we synthesized but never bookshelf’d

Why this exists

Founder pivoted on the bookshelf-build thread (2026-04-30 morning) from “scaffold it now” to “do discovery first — what source material is already on the computer that we can organize into the bookshelf? What did we synthesize but never bookshelf?” This note is the inventory + the resulting decision queue.

Bucket 1 — ON DISK, ready to organize NOW

These are physical PDFs / extracted-text files we already have. Bookshelf-build can absorb these in the first migration pass.

Wheeler material (already in ~/rdco-vault/private/ + ~/wheeler-extract/)

3 books as OCR-extracted text (purchased from SPC Press → VitalSource on 2026-04-21):

Plus the OCR pipeline artifacts at ~/wheeler-extract/{msd,uspc,uv}/{cropped,text}/ (~597MB of intermediate screenshots + extracted text — KEEP for re-OCR if the txt versions get challenged, but don’t need to ship into bookshelf primary).

Plus 28 PDFs from Wheeler’s Metrics Masterclass course at ~/wheeler-extract/masterclass/source/ (also duplicated in ~/Downloads/):

Note: ~/Downloads has the masterclass PDFs DUPLICATED. After bookshelf migration, the ~/Downloads copies can be cleaned up (founder call).

Boyd & Vandenberghe Convex Optimization

~/rdco-vault/06-reference/textbooks/bv_cvxbook.pdf (6.7MB) — Stephen Boyd + Lieven Vandenberghe, Convex Optimization, the canonical free textbook (~700pp). Already in vault, just not in bookshelf structure.

Agile Data Warehouse Design (Lawrence Corr)

~/Downloads/851309126-Agile-Data-Warehouse-Design-eBook-1.pdf — Lawrence Corr & Jim Stagnitto, Agile Data Warehouse Design: BEAM* Modelling. Data warehousing canon, directly RDCO-domain. Source: not noted (likely purchased or rendered from Manning/equivalent). Should move to books/.

367 YouTube/podcast transcripts

~/rdco-vault/06-reference/transcripts/ — 367 files (Moonshots, Acquired, Tim Ferriss, Lex Fridman, Dwarkesh, etc.). Already a partial bookshelf for video/audio content. Bookshelf migration can either MOVE this folder under 07-source-material/transcripts/ or SYMLINK to preserve existing references. Recommend symlink to avoid breaking ~50+ vault notes that reference 06-reference/transcripts/....

Cedric Chin “PBC: More Than You Need To Know” raw article

~/rdco-vault/_scratch_pbc_body.md (239 lines) — leftover scratch from when we processed Cedric Chin’s PBC explainer. Should move to web-archives/cedric-chin-process-behaviour-charts-more-than-you-need-to-know/extracted.md.

Bucket 2 — HEAVILY SYNTHESIZED in vault, NOT on disk, FREE TO ACQUIRE

These are works we cite in many vault notes but don’t have raw source for. All free / publicly available — fair use to bookshelf for personal reference.

WorkAuthorCitationsURLWhy bookshelf
Probabilistic ML Vol I + IIKevin Murphy7 vault fileshttps://probml.github.io/book{1,2}.htmlMost-cited graduate ML reference; would let Ray ground claims about probability/inference
Reinforcement Learning: An IntroductionSutton & Barto18 vault fileshttp://incompleteideas.net/book/the-book.htmlRL canon; relevant to agent architecture work
Deep LearningGoodfellow, Bengio, Courville2 vault files (low cite but high relevance)https://www.deeplearningbook.orgFoundational; explains model behavior
Algorithms for Decision MakingKochenderfer, Wheeler, Wray2 vault fileshttps://algorithmsbook.comDecision theory under uncertainty — directly relevant to MAC severity tiers + the agent-deployer thesis
Foundations of Machine LearningMohri et al.0 direct citeshttp://mlbook.cs.nyu.eduTheoretical math backbone; bookshelf foundation for any ML claim
Understanding Deep LearningSimon Prince0 direct citeshttps://udlbook.github.io/udlbookClearer pedagogy than Goodfellow for prompt-design implications
Multi-Agent RLAlbrecht et al.1 citehttps://marl-book.comDirectly relevant to multi-agent RDCO architecture
Distributional RLBellemare et al.1 citehttps://www.distributional-rl.orgHeavy-tailed distributions; relevant to risk reasoning
Fairness and MLBarocas, Hardt, Narayanan1 citehttps://fairmlbook.orgWhen NOT to trust model output
MLSys Bookcommunity / MIT-adjacent0 direct citeshttps://mlsysbook.aiProduction deployment; relevant to MAC + agent-deployer ops

Total acquisition cost: $0. Time: ~30 min for Ray to wget all 12 PDFs.

Bucket 3 — HEAVILY SYNTHESIZED in vault, NOT on disk, REQUIRES PURCHASE

Founder-judgment territory. Each of these is a work we cite repeatedly, often with chapter/page references, suggesting we’ve referenced the source even though we don’t have the raw text. Acquisition would unlock passage-level retrieval.

WorkAuthorCitationsApprox costWhy high-priority
Out of the CrisisW. Edwards Deming21 vault files (Deming author tag)$30 paperback / $20 KindleOperating-statistics canon; cited heavily via Wheeler + Chin; the source for “operational definitions” + 14 points + system of profound knowledge
Thinking, Fast and SlowDaniel Kahneman8 vault files$15Decision theory + cognitive bias canon; relevant to MAC eval design + agent overconfidence calibration
The Innovator’s DilemmaClayton Christensen12 vault files$15Strategic positioning canon; cited in agent-deployer + RDCO positioning
High Output ManagementAndy Grove7 vault files$20Operating discipline canon; directly applicable to RDCO ops + future client engagements
Fundamentals of Data EngineeringReis & Housley35 files mention Reis$45 (O’Reilly)RDCO domain canon; Joe Reis is one of our most-tracked authors
The Mythical Man-MonthFred Brooks12 vault files$30Software ops canon; relevant to RDCO architecture
Thinking in SystemsDonella Meadows2 vault files (underused — bookshelf would surface more)$20Systems-thinking canon; bookshelf would let Ray ground systemic claims
SuperforecastingTetlock & Gardner2 vault files (underused)$15Forecasting calibration; directly relevant to MAC + Ray overconfidence work
Principles of Product Development FlowDon Reinertsen0 direct cites today (worth surfacing)$50Operations canon; queueing theory in product orgs

Total if you acquire all 9: ~$240. Or pick 2-3 high-priority to start.

Bucket 4 — WEB-ONLY sources, heavy synthesis, no archive

These are blog series / web articles we cite repeatedly but never archived raw. Bookshelf migration could include a one-time backfill scrape.

SourceAuthorVault citationsAcquisition path
Jepsen analyses (jepsen.io/analyses)Kyle Kingsbury (aphyr)15 vault fileswget + html-to-markdown the public analyses; ~50 articles, ~100MB
Commoncog articles (commoncog.com)Cedric Chin131 vault files (!)We have heavy synthesis; could consolidate the source HTML for the ~30-40 most-cited posts. Cedric’s ENTIRE Operations triad is publicly readable
Garry Tan posts/talksGarry Tan76 vault filesMost via YT transcripts already; the blog posts could be added
Buterin / Ethereum.org postsVitalik Buterin2 vault filesSpecific URLs cited, easy archive
Stratechery articlesBen Thompson33 source_url citesPAYWALLED; current process is assessment-only. Bookshelf would only have the public free posts.
Newsletter bodies (already discovered set)Various60 (Stratechery) + 32 (PDM) + 28 (Not Boring) + ~150 othersThese are Gmail-stored; bookshelf migration extends /process-newsletter to also save raw body alongside assessment note

Tagged OUT — not source material

For the record (so the bookshelf doesn’t accidentally absorb these):

Top author citations across vault (file count, deduped) — gap diagnostic

AuthorFiles mentioningOn disk?
Cedric Chin131Partial (1 article scratch + many newsletter assessments)
Garry Tan76Partial (YT transcripts)
Joe Reis35NO (newsletter assessments only; book not acquired)
Deming21NO
Sutton (& Barto)18NO (FREE)
Wheeler15YES (3 books + 28 masterclass PDFs)
Kingsbury (Jepsen)15NO (web series)
Christensen12NO
Brooks12NO
Kahneman8NO
Andy Grove7NO
Murphy7NO (FREE)
Goodfellow2NO (FREE)
Tetlock2NO (under-cited; bookshelf would surface more)
Meadows2NO (under-cited)
Boyd2YES (Convex Optimization PDF in vault)

Decision queue for founder

The actionable decisions surface from the buckets above. Founder picks; Ray executes.

A. Bookshelf scaffold first or migration first?

B. Bucket 2 — free acquisitions (FREE; ~30 min Ray work):

C. Bucket 3 — paid acquisitions (~$15-50 each):

D. Bucket 4 — web archive scrapes (free; varies):

E. Wheeler material logistics:

F. ~/Downloads cleanup:

G. QMD indexing: