Data Quality Testing Framework — “Definition of Done” for Analytics

Why this is in the vault

Founder-authored framework developed through a Claude conversation on March 30. This is original intellectual property — a two-dimensional matrix for data quality testing that generalizes beyond the “whack-a-mole” approach of enumerating individual checks. Directly applicable to: (1) the founder’s dbt consulting work at MG/phData, (2) RDCO’s own BiasAudit pattern in autoinv, (3) a potential Sanity Check article, and (4) the SDG “noisy quality checks” thesis from 2026-04-07-seattle-data-guy-noisy-data-quality-checks.

The problem

AI performs well when results are verifiable — when there’s a machine-checkable definition of “correct.” Analytics is full of implicit rules that live in people’s heads but never get formalized. Enumerating individual checks doesn’t scale. What’s needed is a taxonomy of constraint types so any new metric or pipeline can be validated against the full set of categories.

The two-dimensional framework

Axis 1 — Scope of evaluation (where are you looking?):

Row-level — each record must be internally coherent on its own. If a single row violates, that row is broken. The fix is surgical. Easy to make exhaustive because the unit is self-contained.
Aggregate-level — individual rows can all pass their checks and the totals can still be wrong. Joins that fan out, filters that silently drop records, window functions that double-count across partitions. This is where most real analytics bugs live.
Transformation-level — validates the transition between layers (bronze → silver → gold), not just the state of any single layer. Three questions: Did the right rows come through? (completeness) Did only the right rows come through? (precision) Did the values survive intact? (fidelity)

Axis 2 — Basis of comparison (what are you evaluating against?):

Absolute — does this satisfy a fixed rule? Churned ARR ≤ starting ARR. Rates between 0 and 1. No nulls in key fields. These are accounting identities and bounding rules — self-contained, no external reference needed.
Relative — does this match another part of the system? Salesforce ARR ties to finance ledger. Marketing-attributed pipeline is a subset of total pipeline. Row counts between staging and production match.
Temporal — does this match history or expectations? Current period shouldn’t retroactively change past periods. MoM swings beyond threshold flag for review. Seasonality patterns that break norms.
Human sanity check (plausibility review) — show the number to someone who knows the business and see if they flinch. Not automatable but belongs in the framework explicitly, because otherwise teams skip it when they think “we have tests now.”

Every meaningful test lives at the intersection of one scope × one basis. The definition of done for any new model is coverage across the scope dimension (at least one check at each scope level), with the basis chosen based on what’s actually knowable for that metric.

Key insight: “Delta” is a dimension, not a category

The founder’s initial framing had four levels (row, aggregate, transformation, delta). The refined insight: delta isn’t a peer — it’s a modifier that applies across all scope levels. You can have row-level deltas, aggregate deltas, or transformation deltas. Promoting delta to an axis (temporal basis) makes the framework multiplicative rather than additive — more powerful, not less.

Transformation validation pattern (the one people miss)

For bronze → silver layer transitions in dbt:

-- Completeness: right rows came through
SELECT count(*) FROM bronze WHERE stage = 'Closed Won'
EXCEPT
SELECT count(*) FROM silver_closed_won

-- Precision: only right rows came through  
SELECT opportunity_id FROM silver_closed_won
EXCEPT
SELECT opportunity_id FROM bronze WHERE stage = 'Closed Won'
-- Should return zero

-- Fidelity: values survived intact
SELECT count(*) FROM bronze b
JOIN silver s ON b.id = s.id
WHERE b.amount != s.amount OR b.close_date != s.close_date
-- Should return zero

Mapping against RDCO

BiasAudit in autoinv is a row-level + aggregate-level absolute check system. The four gates (optimization, look-ahead, survivorship, cognitive) are accounting identities for strategy correctness. The framework would suggest we’re missing the temporal basis (does the signal drift over time?) and transformation basis (did the data pipeline preserve fidelity from Polygon → autoinv.data → experiment output?).
SDG’s “noisy checks” article argues for fewer, higher-impact checks. This framework solves the same problem differently: instead of fewer checks, it gives you the right categories of checks so you get coverage without noise.
Sanity Check article candidate: “The Definition of Done for Analytics” — the framework is original, practically applicable, and directly addresses the “AI is chewing up tasks” angle that resonates with the current moment.

2026-04-07-seattle-data-guy-noisy-data-quality-checks — the “fewer better gates” approach that this framework complements
2026-04-11-garry-tan-thin-harness-fat-skills — Tan’s “latent vs deterministic” axis maps onto scope × basis: row-level absolute checks are deterministic, human plausibility review is latent
../01-projects/automated-investing/autoinv/README — BiasAudit implements scope axis (row + aggregate) but is missing temporal basis