Five Myths of Experimentation — Reforge

Summary

Reforge identifies five myths that prevent teams from building effective experimentation programs. Each myth contrasts a common belief with the reality of how strategic experimentation works:

“Experimentation is small random hacks” — Testing button colors and copy changes in isolation. Reality: a healthy program runs a series of tests within a product theme that build on each other, deepening insight over time. Big wins usually stem from a trail of earlier iterations.
“Analysis ends at win/loss” — Ship winners, discard losers. Reality: every test should generate deeper questions. A “loss” that reveals user behavior is more valuable than a “win” you can’t explain. The systematic compounding of learning is the point.
“Experimentation is only for small optimizations” — Reality: the best programs build portfolios spanning simplification to full product reinvention. Experimentation is how you de-risk strategic innovations, not just polish edges.
“Experimentation is cheap, do it constantly” — Used as an insurance policy (“just test it”) or argument settler. Reality: experiments are expensive in time, people, and opportunity cost. The goal is to reduce cost over time through infrastructure and process, while increasing impact through better idea generation. Aim for the high-impact, low-cost quadrant.
“Experimentation = testing” — Reality: experimentation is a four-layer pyramid: (1) identify strategic opportunities, (2) develop potential solutions, (3) test and validate, (4) implement. Testing is one layer. An experimentation program is all four layers as a repeatable system.

Relevance

For 01-projects/squarely-puzzles/index:

At current scale, we probably cannot run statistically significant A/B tests. But we can run the broader experimentation pyramid: identify opportunities, develop solutions, test qualitatively, implement.
Myth 4 is relevant — we should not “just test” every cover design or title variation. Be strategic about what we test and why.

For 01-projects/newsletter/index:

Subject line and format testing is myth 1 territory if done in isolation. Better: pick a theme (e.g., “increase reply rate”) and run a series of experiments that build insight.
Myth 2 applies directly — when an issue gets low engagement, dig into why rather than just moving on.

Connects to 06-reference/2026-04-03-growth-loops-new-funnels — experimentation is how you discover and validate which loops actually work. Connects to 06-reference/2026-04-03-four-fits-framework — experimentation helps validate each “fit” rather than assuming it.

Open Questions

What does a lightweight experimentation system look like at our scale? We don’t have millions of users for A/B testing.
How do we build the “iterate on failures” muscle (myth 2) when the temptation is to move fast?
For Squarely Puzzles, what are the strategic opportunities worth experimenting on vs. the small optimizations we should skip?