06-reference

beck tdd by example

Mon May 04 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: Kent Beck, "Test-Driven Development: By Example" (Addison-Wesley, 2002) ·by Kent Beck
tddkent-beckred-green-refactorsimple-designagent-verifiertesting-canonmac-framework

Kent Beck - “Test-Driven Development: By Example” (2002)

Why this is in the vault

This morning’s ~/.claude/skills/verify-action/ build was textbook red-green-refactor (test fixtures written first, implementation iterated until 6/6 fixtures passed), so the canonical TDD doctrine needs to live in the vault as the named ancestor of how we now ship verification code at RDCO.

The core argument / doctrine (from the 2002 text)

Beck reduces TDD to two operating rules and a three-step inner loop. The two rules are stated in the Preface (p. ix) and restated in the Introduction (p. xix):

  1. Write new code only if an automated test has failed.
  2. Eliminate duplication.

These two rules generate a programming cycle Beck calls Red / Green / Refactor (Beck 2002, Preface, p. x). He writes the cycle multiple times across the book; the most cited form (Part I opener, p. 1, and Ch. 17 “Process”) is:

  1. Add a little test.
  2. Run all tests and see the new one fail.
  3. Make a little change.
  4. Run all tests and succeed.
  5. Refactor to remove duplication.

Quick green excuses all sins, but only for a moment (Beck 2002, ch. 2). The point of step 3 is to get the bar green by any means - a constant, a hard-coded answer, copy-paste - because having a known-passing baseline is the safety net that lets the structural change in step 5 be reversible.

The test list

Before writing code, Beck has the developer write a list of every test the feature will need to pass before it can be called done (Beck 2002, ch. 1; pattern formalized in ch. 25 as “Test List”). The list is a TODO that gets crossed off as red turns green. Tests that surface mid-cycle are added to the list, not chased immediately. The to-do list pattern, with bold-when-active and strike-through-when-done, is shown literally in Ch. 1 of the Money Example. This is what protects the loop from blowing up in scope.

The three approaches to going from red to green

In the “Money Retrospective” (Beck 2002, ch. 17, “One Last Review”), Beck names exactly three approaches to making a test pass cleanly. They are detailed in ch. 28 “Green Bar Patterns”:

The three live on a single axis of step size; together with Test List and One Step Test (ch. 26) they form the actual operational core of the book.

What Beck does NOT codify in this book

The frequently cited “four rules of simple design” (passes tests, expresses intent, no duplication, fewest elements) are NOT enumerated as a numbered list anywhere in the 2002 text. Beck states only the two operating rules (above). The four-rules-of-simple-design framing is from his earlier work (Smalltalk Best Practice Patterns, XP Explained 1999) and from Fowler’s bliki. The 2002 book is narrower: two rules, one cycle, three approaches, plus the patterns catalog.

The TDD patterns catalog (Part III index structure only)

Beck organizes the patterns into named groups. The group structure (Beck 2002, Chs. 25-31):

Beck’s stated motivations (Preface)

Beck frames TDD as a way to manage fear during programming (Beck 2002, Preface, “Courage”, p. xi). Fear makes you tentative, makes you communicate less, makes you avoid feedback. Tests are the teeth of a ratchet that lets you rest between bouts of cranking on a hard problem - once a test is green, that piece of progress is locked in. The deeper claim is in Ch. 32 “Mastering TDD”: TDD shortens the feedback loop on design decisions from weeks-to-months down to seconds-to-minutes. Coverage is a downstream effect, not the goal.

Mapping against Ray Data Co

The verify-action build this morning followed Beck’s loop exactly, and we can now name the steps with primary-source vocabulary.

  1. We wrote ~/.claude/skills/verify-action/test-fixtures.json first - 6 cases. This is Beck’s Test List (ch. 25): every test we knew we needed to pass before calling the rule corpus done, written down before any implementation existed. The R001..R005 numbering froze the list into a file.
  2. Each fixture started red. Implementation in ~/.claude/scripts/verify-action.py was Obvious Implementation (ch. 28) for the simple rules: a literal regex for the U+2014 codepoint (R001) is exactly the case where Beck says “just type it in” - we knew what to type and we could do it quickly. Where we hard-coded patterns to get green and then generalized, that was Fake It (ch. 28).
  3. After all fixtures passed, we extracted the RULES registry and the per-tool dispatch. That is the “remove duplication” step of the cycle - rule #2 of Beck’s two rules.

The Scope x Basis matrix in ~/rdco-vault/01-projects/data-quality-framework/testing-matrix-template.md is the data-output analogue of Beck’s Test List. Where Beck says: enumerate every test you’ll need before writing code; the matrix says: enumerate every check (one per scope, one per basis) before writing the model. The “Definition of Done” coverage checklist is the matrix’s version of “the list is empty, so we are done.” Naming the lineage matters because it tells future model authors why the matrix is non-negotiable rather than a tax.

Where verify-action diverges from Beck (the load-bearing distinction). TDD in Beck’s sense is design pressure: writing the test first forces the developer to invent the API they wish they had, which improves the design (Beck 2002, ch. 32, “Why does TDD work?”). The verify-action runtime hook is post-hoc - it does NOT shape the design of the LLM behavior, it polices the output. We are on the COVERAGE side of the cycle (rule #1 of Beck’s two rules: never ship code that doesn’t have a passing test) but not on the DESIGN-PRESSURE side. That distinction is the architectural choice from 2026-05-05-tdd-is-dead-debate-dhh-beck-fowler and is deliberate.

Beck’s Test (noun) framing predicts our hook. Ch. 25 opens by distinguishing test-the-verb (poking buttons, looking at output) from test-the-noun (a procedure that runs automatically and produces a boolean pass/fail). The verify-action hook is Beck’s “Test (noun)” promoted from compile-time-on-the-developer’s-laptop to runtime-on-Ray’s-tool-call. Same primitive, new substrate.

Rule corpus discipline = Beck’s “fewest elements” instinct, even though that rule isn’t enumerated in this book. v1 has five rules and two are warn-only. The temptation will be to grow the corpus to dozens; Beck’s general posture (across this book, ch. 32 in particular) is that you only add a rule when a real failure forced its existence. The skill’s “Adding a new rule” section (“Add at least one fixture, positive + negative”) locks the red-green-refactor discipline into the rule-extension flow.

Where this primary-source upgrade changed the prior reconstruction