Bookshelf discovery — what’s already on disk + what we synthesized but never bookshelf’d
Why this exists
Founder pivoted on the bookshelf-build thread (2026-04-30 morning) from “scaffold it now” to “do discovery first — what source material is already on the computer that we can organize into the bookshelf? What did we synthesize but never bookshelf?” This note is the inventory + the resulting decision queue.
Bucket 1 — ON DISK, ready to organize NOW
These are physical PDFs / extracted-text files we already have. Bookshelf-build can absorb these in the first migration pass.
Wheeler material (already in ~/rdco-vault/private/ + ~/wheeler-extract/)
3 books as OCR-extracted text (purchased from SPC Press → VitalSource on 2026-04-21):
wheeler-understanding-variation.txt(5,899 lines) — Donald Wheeler, Understanding Variation (2nd ed), 161ppwheeler-making-sense-of-data.txt(17,734 lines) — Wheeler, Making Sense of Data: SPC for the Service Sector, 396ppwheeler-understanding-spc.txt(17,370 lines) — Wheeler & Chambers, Understanding Statistical Process Control (3rd ed), 406pp
Plus the OCR pipeline artifacts at ~/wheeler-extract/{msd,uspc,uv}/{cropped,text}/ (~597MB of intermediate screenshots + extracted text — KEEP for re-OCR if the txt versions get challenged, but don’t need to ship into bookshelf primary).
Plus 28 PDFs from Wheeler’s Metrics Masterclass course at ~/wheeler-extract/masterclass/source/ (also duplicated in ~/Downloads/):
- “1. Why Are We Here.pdf”
- “2.1.1 Why Are You Here.pdf”
- “2.2.1 Variation.pdf”
- “2.3.1 Continuous Improvement.pdf”
- “2.4.1 Process Control vs Optimisation Worldviews.pdf”
- “2.5.1 Input vs Output Metrics.pdf”
- “2.6.1 Putting Into Practice Inside Your Organisation .pdf”
- “5.1 How To Improve Your Process.pdf”
- “5.2 Improving Variation.pdf”
- “Calculation Effect.pdf” + .xlsx
- “Chunky Data.pdf”
- “Common Transformations - Trended Data.pdf”
- “Concept of Variation + 3 Rules.pdf”
- “Count Data.pdf”
- “Double Humped Distributions.pdf”
- “Linear Regression Case and Tutorial.pdf” + .xlsx
- “Metrics Masterclass PDSA Tracker.pdf” + .docx
- “Rare Events.pdf”
- “Running the PDSA Loop at United Schools Network.pdf”
- “Xmrit 3.1 Before and After.pdf”
- “Xmrit 3.2 Monitoring.pdf”
- “Xmrit Prediction.pdf”
- “Height and Weight Linear Regression.xlsx”
- “Shopping Mall Footfall Linear Regression.xlsx”
- “United Schools Network PDSA Tracker.docx”
Note: ~/Downloads has the masterclass PDFs DUPLICATED. After bookshelf migration, the ~/Downloads copies can be cleaned up (founder call).
Boyd & Vandenberghe Convex Optimization
~/rdco-vault/06-reference/textbooks/bv_cvxbook.pdf (6.7MB) — Stephen Boyd + Lieven Vandenberghe, Convex Optimization, the canonical free textbook (~700pp). Already in vault, just not in bookshelf structure.
Agile Data Warehouse Design (Lawrence Corr)
~/Downloads/851309126-Agile-Data-Warehouse-Design-eBook-1.pdf — Lawrence Corr & Jim Stagnitto, Agile Data Warehouse Design: BEAM* Modelling. Data warehousing canon, directly RDCO-domain. Source: not noted (likely purchased or rendered from Manning/equivalent). Should move to books/.
367 YouTube/podcast transcripts
~/rdco-vault/06-reference/transcripts/ — 367 files (Moonshots, Acquired, Tim Ferriss, Lex Fridman, Dwarkesh, etc.). Already a partial bookshelf for video/audio content. Bookshelf migration can either MOVE this folder under 07-source-material/transcripts/ or SYMLINK to preserve existing references. Recommend symlink to avoid breaking ~50+ vault notes that reference 06-reference/transcripts/....
Cedric Chin “PBC: More Than You Need To Know” raw article
~/rdco-vault/_scratch_pbc_body.md (239 lines) — leftover scratch from when we processed Cedric Chin’s PBC explainer. Should move to web-archives/cedric-chin-process-behaviour-charts-more-than-you-need-to-know/extracted.md.
Bucket 2 — HEAVILY SYNTHESIZED in vault, NOT on disk, FREE TO ACQUIRE
These are works we cite in many vault notes but don’t have raw source for. All free / publicly available — fair use to bookshelf for personal reference.
| Work | Author | Citations | URL | Why bookshelf |
|---|---|---|---|---|
| Probabilistic ML Vol I + II | Kevin Murphy | 7 vault files | https://probml.github.io/book{1,2}.html | Most-cited graduate ML reference; would let Ray ground claims about probability/inference |
| Reinforcement Learning: An Introduction | Sutton & Barto | 18 vault files | http://incompleteideas.net/book/the-book.html | RL canon; relevant to agent architecture work |
| Deep Learning | Goodfellow, Bengio, Courville | 2 vault files (low cite but high relevance) | https://www.deeplearningbook.org | Foundational; explains model behavior |
| Algorithms for Decision Making | Kochenderfer, Wheeler, Wray | 2 vault files | https://algorithmsbook.com | Decision theory under uncertainty — directly relevant to MAC severity tiers + the agent-deployer thesis |
| Foundations of Machine Learning | Mohri et al. | 0 direct cites | http://mlbook.cs.nyu.edu | Theoretical math backbone; bookshelf foundation for any ML claim |
| Understanding Deep Learning | Simon Prince | 0 direct cites | https://udlbook.github.io/udlbook | Clearer pedagogy than Goodfellow for prompt-design implications |
| Multi-Agent RL | Albrecht et al. | 1 cite | https://marl-book.com | Directly relevant to multi-agent RDCO architecture |
| Distributional RL | Bellemare et al. | 1 cite | https://www.distributional-rl.org | Heavy-tailed distributions; relevant to risk reasoning |
| Fairness and ML | Barocas, Hardt, Narayanan | 1 cite | https://fairmlbook.org | When NOT to trust model output |
| MLSys Book | community / MIT-adjacent | 0 direct cites | https://mlsysbook.ai | Production deployment; relevant to MAC + agent-deployer ops |
Total acquisition cost: $0. Time: ~30 min for Ray to wget all 12 PDFs.
Bucket 3 — HEAVILY SYNTHESIZED in vault, NOT on disk, REQUIRES PURCHASE
Founder-judgment territory. Each of these is a work we cite repeatedly, often with chapter/page references, suggesting we’ve referenced the source even though we don’t have the raw text. Acquisition would unlock passage-level retrieval.
| Work | Author | Citations | Approx cost | Why high-priority |
|---|---|---|---|---|
| Out of the Crisis | W. Edwards Deming | 21 vault files (Deming author tag) | $30 paperback / $20 Kindle | Operating-statistics canon; cited heavily via Wheeler + Chin; the source for “operational definitions” + 14 points + system of profound knowledge |
| Thinking, Fast and Slow | Daniel Kahneman | 8 vault files | $15 | Decision theory + cognitive bias canon; relevant to MAC eval design + agent overconfidence calibration |
| The Innovator’s Dilemma | Clayton Christensen | 12 vault files | $15 | Strategic positioning canon; cited in agent-deployer + RDCO positioning |
| High Output Management | Andy Grove | 7 vault files | $20 | Operating discipline canon; directly applicable to RDCO ops + future client engagements |
| Fundamentals of Data Engineering | Reis & Housley | 35 files mention Reis | $45 (O’Reilly) | RDCO domain canon; Joe Reis is one of our most-tracked authors |
| The Mythical Man-Month | Fred Brooks | 12 vault files | $30 | Software ops canon; relevant to RDCO architecture |
| Thinking in Systems | Donella Meadows | 2 vault files (underused — bookshelf would surface more) | $20 | Systems-thinking canon; bookshelf would let Ray ground systemic claims |
| Superforecasting | Tetlock & Gardner | 2 vault files (underused) | $15 | Forecasting calibration; directly relevant to MAC + Ray overconfidence work |
| Principles of Product Development Flow | Don Reinertsen | 0 direct cites today (worth surfacing) | $50 | Operations canon; queueing theory in product orgs |
Total if you acquire all 9: ~$240. Or pick 2-3 high-priority to start.
Bucket 4 — WEB-ONLY sources, heavy synthesis, no archive
These are blog series / web articles we cite repeatedly but never archived raw. Bookshelf migration could include a one-time backfill scrape.
| Source | Author | Vault citations | Acquisition path |
|---|---|---|---|
| Jepsen analyses (jepsen.io/analyses) | Kyle Kingsbury (aphyr) | 15 vault files | wget + html-to-markdown the public analyses; ~50 articles, ~100MB |
| Commoncog articles (commoncog.com) | Cedric Chin | 131 vault files (!) | We have heavy synthesis; could consolidate the source HTML for the ~30-40 most-cited posts. Cedric’s ENTIRE Operations triad is publicly readable |
| Garry Tan posts/talks | Garry Tan | 76 vault files | Most via YT transcripts already; the blog posts could be added |
| Buterin / Ethereum.org posts | Vitalik Buterin | 2 vault files | Specific URLs cited, easy archive |
| Stratechery articles | Ben Thompson | 33 source_url cites | PAYWALLED; current process is assessment-only. Bookshelf would only have the public free posts. |
| Newsletter bodies (already discovered set) | Various | 60 (Stratechery) + 32 (PDM) + 28 (Not Boring) + ~150 others | These are Gmail-stored; bookshelf migration extends /process-newsletter to also save raw body alongside assessment note |
Tagged OUT — not source material
For the record (so the bookshelf doesn’t accidentally absorb these):
01-projects/financials/{2023,2024,2025}/*.pdf— operational ledgers, NOT source01-projects/financials/tax-prep/*.pdf— W-2, 1099, K-1 — operational, NOT source~/Downloads/1Password Emergency Kit*.pdf— security artifact, skip~/postcard-preview.pdf— RDCO marketing artifact, skip01-projects/automated-investing/.venv/.../*.pdf— matplotlib build artifacts, skip
Top author citations across vault (file count, deduped) — gap diagnostic
| Author | Files mentioning | On disk? |
|---|---|---|
| Cedric Chin | 131 | Partial (1 article scratch + many newsletter assessments) |
| Garry Tan | 76 | Partial (YT transcripts) |
| Joe Reis | 35 | NO (newsletter assessments only; book not acquired) |
| Deming | 21 | NO |
| Sutton (& Barto) | 18 | NO (FREE) |
| Wheeler | 15 | YES (3 books + 28 masterclass PDFs) |
| Kingsbury (Jepsen) | 15 | NO (web series) |
| Christensen | 12 | NO |
| Brooks | 12 | NO |
| Kahneman | 8 | NO |
| Andy Grove | 7 | NO |
| Murphy | 7 | NO (FREE) |
| Goodfellow | 2 | NO (FREE) |
| Tetlock | 2 | NO (under-cited; bookshelf would surface more) |
| Meadows | 2 | NO (under-cited) |
| Boyd | 2 | YES (Convex Optimization PDF in vault) |
Decision queue for founder
The actionable decisions surface from the buckets above. Founder picks; Ray executes.
A. Bookshelf scaffold first or migration first?
- Option 1: Build
07-source-material/structure +/save-to-bookshelfskill, THEN migrate. (Cleaner, risk: pre-build over-engineering.) - Option 2: Move existing on-disk material into a flat
07-source-material/{books,masterclass,transcripts,web-archives}/first, evolve structure as we add. (Pragmatic, risk: have to refactor later.) - Recommend Option 1: scaffold takes ~1 hour, migration takes ~30 min. Building first means migration writes into the right shape, no refactor.
B. Bucket 2 — free acquisitions (FREE; ~30 min Ray work):
- Greenlight to download all 12 of Dami-Defi’s curation set (Murphy I+II, Sutton+Barto, Goodfellow, Kochenderfer, Mohri, Prince, MLSysBook, MARL, Distributional RL, Fairness ML)?
- Or pick a smaller subset? My recommendation if cutting: Murphy I, Sutton & Barto, Kochenderfer (all directly relevant to RDCO’s agent + decision-theory work).
C. Bucket 3 — paid acquisitions (~$15-50 each):
- Top-3 priority candidates by my read: Deming (Out of the Crisis) + Reis (Fundamentals of Data Engineering) + Andy Grove (High Output Management). All three are operating-canon directly applicable to RDCO + future client engagements. ~$95 total.
- Lower-priority but worth: Kahneman, Christensen, Tetlock, Meadows, Brooks. Build over time as need surfaces.
D. Bucket 4 — web archive scrapes (free; varies):
- Jepsen series: ~50 articles, ~100MB scrape. Worth doing because Kingsbury is heavily cited in MAC + verification-layer thinking.
- Commoncog source consolidation: low-cost, just consolidate what’s already in newsletter assessments.
- Defer Stratechery (paywalled).
E. Wheeler material logistics:
- Move from
private/to07-source-material/books/wheeler-{slug}/with proper metadata + same.gitignoreprotection? - OR keep
private/as the canonical home for Wheeler (already structured, already gitignored) and have bookshelf cite into it via path? - Recommend MOVE — bookshelf is the single source of truth;
private/becomes a transitional artifact. Updateprivate/README.mdto point at the new location. The .gitignore line forprivate/extends naturally to07-source-material/(or we add 07-source-material/ explicitly).
F. ~/Downloads cleanup:
- The 24 masterclass PDFs in ~/Downloads are duplicates of
~/wheeler-extract/masterclass/source/. After migration, delete the ~/Downloads dups?
G. QMD indexing:
- The existing
private/README.mdsays “Don’t index in QMD by default.” Inherit that posture for the bookshelf, OR build a separatesource-materialQMD collection with explicit founder green-light? - Recommend: build the bookshelf with NO QMD indexing initially, validate the retrieval pattern via
grepandRead, then decide on QMD-indexing as a Phase 2 decision once we see if grep is sufficient at this scale.
Related
- 2026-04-30-bookshelf-source-material-architecture-gap — the architectural concept this discovery serves
- 2026-04-30-quality-gate-as-brain-org-boundaries-agentic-companies — bookshelf is the input layer the gate evaluates against
- 2026-04-30-dami-defi-12-graduate-ml-textbooks-curation — Bucket 2 starter set
- ../private/README.md — Wheeler material current home (to be deprecated post-migration)
- ../06-reference/textbooks/bv_cvxbook.pdf — Convex Optimization PDF current home (to be migrated)