"The Wild Story of the Teton Dam Failure" — Practical Engineering

Episode summary

In June 1976, a brand-new 305-ft Bureau of Reclamation earthen dam in southeastern Idaho failed catastrophically on its first reservoir fill. From the moment morning crews spotted muddy water emerging on the embankment face to the moment the dam breached and emptied a near-full reservoir into the Snake River Valley was about five hours. Eleven people died; thousands lost homes and livestock; the towns of Wilford, Sugar City, and Rexburg were largely destroyed. Grady walks the failure as a multi-cause geotechnical disaster — fractured volcanic foundation rock, the worst-possible core material (windblown silt that erodes into self-supporting tunnels rather than slumping), a key-trench geometry that caused the embankment to arch over its own erosion voids rather than collapse them shut, and an accelerated fill schedule that bypassed the one-foot-per-day safety constraint because the river outlet works tunnel was unfinished and there was no other way to release spring runoff. The investigative consensus: every failure mechanism was well-known engineering practice in 1972. The Bureau simply chose not to spend the money. The episode lands on the catalyzing effect Teton had on US dam-safety regulation and on Grady's personal experience working a "Tetonesque" foundation early in his career — and why having the failure top-of-mind matters more than the spreadsheet.

Key arguments / segments

[00:00:00-00:01:00] Setup: Teton Dam topped out end of 1975 at 305 ft, $100M, controversial from the start. Bureau of Reclamation flagship of the Teton Basin project — flood control, power, recreation, irrigation.
[00:01:00-00:02:30] The contractor was behind on the river outlet works (the main controlled-release tunnel). Only the auxiliary tunnel was operational. Bureau decided to begin first fill anyway rather than lose a full year of spring runoff economic value.
[00:02:00-00:03:00] The design called for a maximum fill rate of 1 ft/day with daily inspections and well monitoring — explicit operational protocol so problems could be caught early enough to lower the reservoir back down.
[00:02:30-00:03:30] Snowy winter forced a fill rate that violated the 1 ft/day limit. The auxiliary outlet couldn't keep up. The constraint was relaxed because "the dam seemed to be holding fine and there really wasn't another choice." The constraint itself was the protective measure being abandoned.
[00:03:00-00:04:00] June 3rd: clear seeps far from the dam — not unusual. June 5th 7am: leaks on the dam itself. By 10am the construction engineer was looking into a tunnel through the embankment "tall enough to stand in."
[00:04:00-00:06:00] Zoned embankment dam structure. Zone 1 core: windblown silt (loess) from the local uplands. The geological problem: foundation rock was Yellowstone-eruption welded tuff with cooling cracks, gas voids, captured-rubble zones, and millions of years of seismic fissures. "Essentially Swiss cheese."
[00:06:00-00:08:00] Failed grouting program: a pilot study showed the rock wouldn't seal — drilled holes took endless grout, post-grout core samples came back as broken as before. Bureau pivoted to key trenches with a triple-row grout curtain at the bottom. Used 600,000 cubic feet of grout (more than 2x estimate) and backfilled the trench with the silt core.
[00:08:00-00:09:00] Two independent investigations: dam was doomed by design. Water reached fractured upper rock, flowed freely to the silty core trench, and got past it through one of three mechanisms (or all three): grout-curtain windows, poor compaction zones, or hydraulic fracturing where reservoir pressure exceeded the overburden weight.
[00:09:30-00:11:00] Why silt was the worst possible core material: gravel and sand mechanically interlock; clay particles cohere via intermolecular forces; silt sits in the gap — too small to interlock, too large for cohesion, and lightweight. Worse: silt is strong enough to maintain vertical walls and a roof as it erodes. Tunnels (piping) form and stay open instead of self-healing. The "strength" of the material was the property that destroyed it.
[00:11:00-00:12:00] The narrow steep-sided key trench caused soil downforce to arch laterally into the rock walls instead of propagating straight down. So when piping voids formed inside the core, the embankment arched over them rather than collapsing them shut. Zone 2 filter material was bypassed entirely because the seepage path went under it.
[00:12:00-00:13:00] The breach: 10:30am muddy geyser. Bulldozers swallowed by the embankment, operators rescued with ropes. 11:00am sinkhole on the upstream face — water now had a direct path. Witnesses saw a whirlpool form. Embankment failed at noon, five hours after first warning.
[00:13:00-00:14:00] Wilford wiped out, Sugar City and Rexburg "decimated," Hibbard and Salem inundated. Eleven dead. Grady's blunt counterfactual: had it failed at night, hundreds or thousands.
[00:14:00-00:15:00] Investigation finding: "not a freak accident." The state-of-the-art was sufficient. Defensive measures (rock surface sealing, adequate filters) "should have been used." A direct quote from one investigation: "frugality over safety."
[00:15:00-00:16:00] Aftermath: federal dam safety guidelines standardized across all federal agencies, still in force. Spurred research into filter and drainage design and hydraulic fracturing mechanisms.
[00:16:00-00:17:00] Personal coda: Grady's first dam project was "very much Tetonesque" — pores, fractures, caves discovered mid-construction. They re-engineered the foundation with a deep cutoff wall. The point: having Teton at the top of mind was load-bearing for the design team in a way the spreadsheets weren't. "It was grounding in a way that textbooks and Excel spreadsheets are not."
[00:17:00-end] Sponsor segment for Nebula, framed around Joe Scott's geology documentary. (Sponsor disclosed.)

Notable claims

The protective constraint was the safeguard the Bureau dropped. The 1 ft/day fill rate wasn't a guideline — it was the explicit mechanism by which problems could be caught early enough to lower the reservoir back down. Once that limit was relaxed because the river outlet works wasn't ready, the entire safety regime was operating without its primary surveillance window.
Five hours from first leak to total breach. This is the speed at which a piping failure in an erodable core dam progresses once it gets going. The runaway feedback loop (more flow → more erosion → more flow) is not survivable on operational timescales.
Silt's strength was its lethal property. The intuition "strong material = good dam material" inverts when the failure mode is internal erosion: weaker materials slump into voids and self-heal, strong materials hold tunnel walls open and accelerate piping. Material selection has to be evaluated against the failure mode, not the intuitive property.
The key trench geometry caused arching. Narrow steep walls converted vertical soil weight into lateral thrust against the rock — the same physics as an arch bridge, applied in the worst possible context. The structure literally helped the failure progress.
"Frugality over safety" — the engineering practice was already known. This is not a story about novel failure modes or pushing the envelope. The Bureau had every signal it needed (pilot grout study failed, foundation was visibly broken rock, silt was a known-bad core choice) and chose cheaper paths anyway.
Honoring victims means keeping the failure stories live. Grady's framing: textbooks and spreadsheets don't transmit stakes; case studies do. The vivid memory of Teton was what kept his team conservative on his first project.

Guests

None — solo Grady Hillhouse episode.

Sponsorship

Single sponsor block at the end of the episode (timestamp ~00:17:00) for Nebula, framed around Joe Scott's geology documentary. Standard Practical Engineering sponsor pattern; clearly disclosed. No editorial entanglement with the Teton failure analysis itself.

Why this is in the vault

Teton is the load-bearing US dam-failure case study — the failure that catalyzed federal dam safety regulation as it exists today, and the closest physical-infrastructure analog to RDCO's process-control failure-mode work. Pairs naturally with the Taum Sauk note (control-system / normal-accidents framing) and the Texas Floods note (binary-threshold-around-continuous-risk in regulatory regimes); together those three episodes form a triptych of the specific failure modes the harness thesis is most worried about. Canonical because Teton anchors the regulatory-history half of the dam-safety canon and because the silt-strength-as-lethal-property finding is a generalizable counter-intuition worth keeping in active memory.

Mapping against Ray Data Co

CA-022 binary-decision-around-continuous-probability — the abandoned 1 ft/day constraint is the canonical "binary collapse of continuous risk" failure. The fill rate limit was a continuous-control protocol: rise slowly, monitor wells, lower if anomalous. The Bureau collapsed it to a binary ("the dam seems fine, so we can fill faster") because the spring runoff schedule made the slow path infeasible. This is the exact pattern Grady's Texas floods episode names at the regulatory layer (NFIP floodplain maps as binary thresholds over continuous flood risk) — but Teton does it inside a single project's operational protocol. The lesson generalizes to RDCO autonomous-loop design: when an operational constraint is the only mechanism by which a problem can be caught early, relaxing the constraint is equivalent to disabling the safety system. The CA-022 article should pick up Teton as a source: the fill rate isn't binary on the surface (it's a continuous threshold) but the moment the operator decides "we'll just go faster," the continuous protective surveillance window collapses to "ship it." The newer dam Grady worked on solved the same problem differently — they engineered the foundation failure mode out (deep cutoff wall) rather than instrumenting around it. Same pattern as the Taum Sauk roller-compacted-concrete rebuild: when a binary has catastrophic failure modes, prefer to remove the binary from the loop entirely.

Operational definitions (CA — concepts/operational-definitions) — "the dam seemed to be holding fine" is the canonical operationally-undefined safety judgment. What counted as "holding fine"? No specified instrument readings, no thresholds, no trigger criteria. Daily inspections found seeps far from the dam on June 3rd and were classified as "not unusual" — a judgment call without an operational definition of which seeps would count as anomalous. By the time leaks appeared on the dam itself two days later, the failure was already five hours from breach. The Wheeler/Donald-J.-Wheeler discipline (see [[2026-04-15-commoncog-whats-operational-definition]]) — that "safe" and "failing" need operational definitions before you can make decisions in the noise — is exactly what Teton lacked. Field crews had no agreed-upon set of measurements that would have triggered a "lower the reservoir" call. RDCO-side: the autonomous loop's idle-state and failure-state definitions need the same treatment. "The agent seems to be running fine" is a Teton-class operational claim. Wire /check-board to specific readings (last successful task timestamp, queue freshness, channel reply latency) before declaring the loop healthy.

CA-016 layered defense — Teton is the cleanest "all five layers shared the same coupling" failure. On paper: (1) grout curtain to seal foundation rock; (2) key trench backfilled with silt to provide a watertight barrier; (3) zone 2 filter to catch any erosion from the core; (4) zone 1 silt core itself as the watertight barrier; (5) operational fill-rate limits and daily inspections to catch problems early. In practice: layer 1 had unsealable windows. Layer 2 used the same erodable silt as layer 4. Layer 3 was bypassed entirely because the seepage went under it through the foundation. Layer 4's strength caused tunnels to stay open instead of collapsing. Layer 5 was abandoned because of the construction schedule. Every layer failed because they all shared the same root coupling: an unsealed foundation that nobody had operational evidence was actually sealed. This is the pattern the layered-defense article warns about — stacked layers that share an unobserved coupling are not five layers, they are one. Direct source addition to CA-016. RDCO version: if the autonomous loop has "five safeguards" (allowlist, rate limit, channel scope, audit log, founder review) and all five depend on the same Notion API state being accurate, that's one safeguard in five disguises.

The MAC framework / "frugality over safety" maps directly to RDCO operational-cost decisions. The investigation's verdict — defensive measures were within state-of-the-art, the Bureau simply didn't pay for them — is the kind of judgment that needs to be applied to RDCO's own operational-spend decisions. When the founder skips an LLM model upgrade, an additional eval run, or a backup compute provider because of monthly budget, that's a Teton-class trade if the failure mode being saved against is catastrophic. The MAC framework (concepts/MAC, if/when written; see [[2026-03-06-stratechery-higher-powers-lower-macs]]) gives a structured way to ask "what's the minimum acceptable cost, and is the saved spend smaller than the externalized cost in expectation." Teton failed that test in 1972; Taum Sauk failed it in 2005; the lesson is the same.

Generalizable failure pattern — "the protective constraint was the first thing dropped under schedule pressure." This is the load-bearing finding worth surfacing across the failure-mode work: when an operational safeguard is the protective measure (not an additional check, but the only mechanism by which a category of problem can be caught), it is also the safeguard most likely to be dropped because schedule pressure is what creates the demand to drop safeguards in the first place. Worth a structured concept article: "Schedule-Pressure Selection of Safety Drops" or similar. The Teton-Taum-Sauk-Texas-Floods triad is the canonical evidence triple.

[[concepts/binary-decision-around-continuous-probability]] — direct source addition; the abandoned 1 ft/day fill rate is the "continuous protocol collapsed to binary under schedule pressure" pattern.
[[concepts/operational-definitions]] — direct source addition; "the dam seemed to be holding fine" is the canonical operationally-undefined safety judgment.
[[concepts/layered-defense-architecture]] — direct source addition; Teton's five protective layers all shared the same root coupling (unsealed foundation), the failure mode the article warns about.
[[2026-04-20-practical-engineering-taum-sauk-dam-failure]] — companion failure case; Taum Sauk is the control-system / normal-accidents version, Teton is the geotechnical / material-selection version. Together they cover both halves of the dam-failure canon.
[[2026-04-20-practical-engineering-an-engineers-perspective-on-the-texas-floods]] — Grady's binary-threshold-over-continuous-risk piece at the regulatory layer; Teton is the same anti-pattern inside a single project's operational protocol.
[[2026-04-20-practical-engineering-spillway-failed-on-purpose]] — the inverse case: Oroville is what happens when the passive overflow path operates as designed; Teton is what happens when the only protective measure (fill rate constraint) is abandoned.
[[2026-04-15-commoncog-whats-operational-definition]] — Wheeler / Cedric Chin on why "safe" and "failing" need operational definitions before you can make decisions in the noise; the directly-applicable epistemic discipline Teton lacked.
[[2026-04-15-commoncog-process-behaviour-charts]] — Wheeler-style process control: the methodology that would have given Teton's daily inspections actual decision criteria.

practical engineering teton dam failure