“Two Types of Data Analysis” — @CedricChin
Why this is in the vault
Chin’s clearest articulation of the Experimental vs Observational split — the exact epistemic distinction MAC has to carry at every cell of the matrix. Each MAC test is either a one-time comparison (experimental) or a sequential watch (observational), and the tradeoffs (sensitivity vs false-alarm control, finite vs continuing data) determine how we set thresholds and response protocols.
The core argument (paraphrased)
There are two environments data analysis occurs in, and they demand opposite attitudes:
- Experimental Studies — you deliberately create a comparison (A/B test, holdback). Finite data, one-time analysis, you want sensitivity, so you tolerate a non-trivial false-alarm rate (the classical 5% alpha).
- Observational Studies — data falls out of a running process (traffic, working capital, inventory). You’re not comparing conditions, you’re asking whether an unplanned change has occurred. Because every new datapoint is a fresh act of analysis, the overall false-alarm budget forces each individual check to be conservative (Wheeler: “a trivial risk of a false alarm”, under 1%).
Quoting Wheeler, who Chin leans on: “an Observational Study, of necessity, requires a conservative, sequential analysis technique”, whereas an Experimental Study “use[s] a finite amount of data to perform a one-time analysis”.
The tradeoffs:
- Sensitivity. Experimental techniques (confidence intervals, hypothesis tests, regression) squeeze the most signal from limited data. Process behaviour charts (XmR) with fixed three-sigma limits are deliberately less sensitive — that’s the cost of sequential analysis. Tiny effects need experiments; company-level questions (“is our working capital slump real?”) can’t be A/B tested and force you into observational tooling.
- Time. Observational charts need 6 points for usable limits, 10–15 to settle. You wait. Experiments cost setup but return answers now. Chin’s live confession: “sometimes, goddamn, I just want to know what my actions have accomplished … today”.
Unifying frame: the purpose of both is knowledge — models with predictive power that lead to effective action. Wheeler’s north star: “the purpose of analysis is insight”, and the best analysis is the simplest one that provides it.
Mapping against Ray Data Co
1. Every MAC cell is one or the other — and must know which. The MAC 3×6 matrix silently mixes both modes. A reconciliation check against a source system when we first onboard a client is experimental (one-shot, comparison, higher alpha acceptable). The same check running nightly in production is observational (sequential, must use tight thresholds or it’ll false-alarm constantly). The matrix template at ../01-projects/data-quality-framework/testing-matrix-template should force the operator to tag each cell as Experimental/Observational and pick thresholds accordingly — this is a concrete refinement the framework needs.
2. The agent-deployer (2026-04-14-levie-agent-deployer-role-jd) runs both modes. Eval-time checks (did this prompt change improve tool-call accuracy?) are experimental — finite data, clear comparison, looking for real lift. Production monitoring of agent outputs is observational — you’re watching a stream for unplanned drift. Conflating them is how eval regimes produce noisy dashboards that everyone ignores.
3. State-ownership architecture encodes the observational worldview. ../04-tooling/rdco-state-ownership-architecture treats the vault and skills as the continuing process being observed; model swaps are the “special cause” events. That framing only works if we import Chin’s conservative-threshold discipline — otherwise every model upgrade looks like a crisis.
4. phData/MG comparison benefits from naming the split. phData’s data quality posture leans observational (continuous Snowflake monitoring). MG-style consulting engagements are more experimental (short discovery sprints with finite data). Neither is wrong, but the client-facing pitch should name which one we’re offering at which phase — “6-week audit” is experimental; “MAC-in-production” is observational.
5. The time-cost of observational analysis is a consulting sales objection to preempt. Chin openly complains about waiting 6 weeks for limit lines to settle. Clients will too. The sales answer: run a short experimental audit (finite, fast, sensitive) to establish baseline truth, then hand off the observational MAC practice for ongoing governance. Two phases, two modes, one engagement.
Related
- 2026-04-15-commoncog-becoming-data-driven-first-principles — the series cornerstone; this article is the epistemic complement
- ../01-projects/data-quality-framework/testing-matrix-template — MAC matrix needs explicit Experimental/Observational tagging per cell
- ../04-tooling/rdco-state-ownership-architecture — state as the process being observed
- 2026-04-14-levie-agent-deployer-role-jd — agent-deployer runs evals (experimental) and monitoring (observational)
- 2026-04-12-corr-stagnitto-agile-data-warehouse-design-master-synthesis — data profiling is mostly observational; design-time checks are experimental