Data Quality Testing Matrix — Scope × Basis
Every meaningful data test lives at the intersection of one scope and one basis. The definition of done for any new model is coverage across the scope dimension — at least one check at each scope level.
The Two Dimensions
Axis 1 — Scope (where are you evaluating?):
- Column-level — one column in isolation. Uniqueness, null rate, value distribution, accepted values, logical range. The unit is a single field — no cross-column awareness needed.
- Row-level — cross-column rules within a single record. Conditional logic, calculated field validation, source-specific row shape. “Gross amount ≥ sale amount” requires comparing two fields. “New opportunities can’t have negative ACV” is a conditional rule.
- Aggregate-level — totals, counts, and ratios across the full dataset. Individual rows can all pass their checks and the totals can still be wrong. Joins that fan out, filters that silently drop records, window functions that double-count across partitions. This is where most real analytics bugs live.
Axis 2 — Basis (what are you evaluating against?):
- Absolute — does this satisfy a fixed rule? No external reference needed. Accounting identities, bounding rules, type constraints.
- Relative: Source — does this match the upstream data? Completeness (right rows came through), precision (only right rows), fidelity (values survived intact). This is the transformation validation — comparing downstream to upstream.
- Relative: Production — does this match existing reporting or the finance ledger? Cross-system reconciliation against the current state of truth.
- Relative: Reconciliation — does this match an external reconciliation file or third-party data? Ties to Stripe, bank statements, vendor exports.
- Temporal — does this match history or expectations? Current period shouldn’t retroactively change past periods. MoM swings beyond threshold flag for review. Seasonality patterns that break norms.
- Human Sanity Check — show the number to someone who knows the business and see if they flinch. Not automatable but belongs in the framework explicitly, because otherwise teams skip it when they think “we have tests now.”
Prerequisites
Before filling in the matrix, complete these steps:
System of Record (SoR) Identification
For each source table feeding this model, identify the authoritative system:
| Source Table | System of Record | Confidence | Notes |
|---|---|---|---|
| e.g. opportunities | Salesforce | High — single source | No competing systems |
| e.g. customers | Salesforce + HubSpot | Medium — two sources, need merge rules | SF is primary, HS fills gaps |
Why this matters: You can’t write Relative: Source checks without knowing WHICH source is authoritative. If there are competing sources for a conformed dimension, that’s a severity 1 issue (Stop) — resolve it before writing tests.
Severity Tiers
Each check in the matrix should be assigned a severity that determines the pipeline response:
| Severity | Response | When to use |
|---|---|---|
| Stop | Block the pipeline. Data cannot be served. | Accounting identities violated, missing conformed dimensions, referential integrity broken, wrong system of record |
| Pause | Flag for review before serving. Pipeline continues but output is quarantined. | Missing mandatory values, row count swings beyond threshold, aggregate reconciliation outside tolerance |
| Go | Log the finding, continue serving. Review in next cycle. | Missing optional attributes, minor distribution shifts, cosmetic data issues |
Default severity mapping by basis:
- Absolute → Stop (hard constraints that should never fail)
- Relative: Source → Stop (transformation broke something)
- Relative: Production → Pause (might be a real change or a bug — needs human judgment)
- Relative: Recon → Pause (external system may have different timing)
- Temporal → Pause (anomaly, not necessarily wrong)
- Human → Go (informational, not blocking)
Layer-Aware Severity
The medallion architecture changes what “bad” means at each layer. The same issue has different severity depending on where it occurs:
| Issue | Bronze | Silver | Gold |
|---|---|---|---|
| Null in key field | Go (source sends it, log and move on) | Pause (transformation should have handled it — possible bug) | Stop (consumer-facing, blocks the report) |
| Row count mismatch vs upstream | Pause (ingestion may have filtered) | Stop (star schema must account for all source rows) | Pause (gold may intentionally aggregate/filter) |
| Value outside expected range | Go (source data quality issue, flag for SoR team) | Stop (silver should have cleansed or rejected) | Stop (invalid data reached the consumer) |
| Schema change detected | Stop (bronze contract with source broke) | Pause (may need model update) | Go (gold schema is independently defined) |
| Duplicate records | Pause (source may send dupes, dedup in silver) | Stop (star schema must not have dupes) | Stop (report double-counts) |
The principle: Bronze is permissive (mirror what the source gives you). Silver is strict (the transformation layer is where quality is enforced). Gold is zero-tolerance for consumer-facing issues but permissive on structure (intentional reshaping is expected).
How this maps to dbt:
- Bronze tests:
severity: warnfor most checks (log, don’t block) - Silver tests:
severity: errorfor key integrity and transformation fidelity - Gold tests:
severity: errorfor anything consumer-facing;severity: warnfor structural checks
Override these defaults based on business impact. A null in an optional description field is a Go even though it’s an Absolute check. A temporal swing of 200% in revenue is a Stop even though temporal checks default to Pause.
The Matrix (with Salesforce Closed-Won Example)
| Absolute | Relative: Source | Relative: Production | Relative: Recon | Temporal | Human | |
|---|---|---|---|---|---|---|
| Column | opportunity_id not null, unique. amount > 0. stage in accepted values list. close_date is a valid date. | stage values in silver exist in bronze enum. Column types match source schema. | Column definitions match production data dictionary. | — | Null rate for amount stable across runs (not suddenly spiking). Distribution of stage values consistent. | Spot-check: do the top 10 amounts look reasonable for this business? |
| Row | start_date < end_date. “New” opportunities must have positive ACV. Rows from Salesforce must have sf_id populated; rows from HubSpot must have hs_id. | Each opportunity_id in silver exists in bronze with matching stage = 'Closed Won'. Row shape matches source expectations (SF rows have X,Y,Z; HS rows have A,B,C). | Calculated fields (win rate, ACV) use same formula as production dashboards. | Row-level amounts tie to invoice line items in billing system. | No single contract’s close_date changed since last run (unless restatement flagged). | Pull 5-10 random closed-won records. Does the amount look right? Is the close date plausible? |
| Aggregate | Churned ARR ≤ starting ARR. Win rate between 0% and 100%. Total closed-won count ≥ 0. | Silver closed-won count = bronze count WHERE stage = 'Closed Won'. No IDs in silver that aren’t closed-won in bronze. | Salesforce ARR ties to finance ledger ± tolerance. Pipeline totals match executive dashboard. | Ending ARR ≈ Stripe MRR × 12 ± 5%. Total invoiced matches billing system export. | MoM closed-won count doesn’t swing >30% without a flag. Q4 should be higher than Q1 (seasonality). | Show the pipeline waterfall to a sales leader. Does the shape match their mental model? |
How to Read the Matrix
Rows (Scope) answer: where are you evaluating?
- Column: Is each field valid on its own? (isolation)
- Row: Is each record internally coherent? (cross-column)
- Aggregate: Are the totals correct? (dataset-wide)
Columns (Basis) answer: what are you evaluating against?
- Absolute: A fixed logical rule (self-contained, no external reference)
- Relative: Source: The upstream data layer (transformation validation)
- Relative: Production: Existing reporting or dashboards (reconciliation to truth)
- Relative: Recon: External third-party data (billing, bank, vendor)
- Temporal: Historical baseline or expected pattern (change detection)
- Human: Someone who knows the business looks at it and doesn’t flinch
The “Definition of Done” for Any New Model
Before shipping a new dbt model or Snowflake view, confirm:
- At least one column-level check exists (null, unique, range, accepted values)
- At least one row-level check exists (cross-column rules, conditional logic)
- At least one aggregate-level check exists (accounting identities, reconciliation)
- At least one relative: source check exists (upstream comparison — completeness, precision, fidelity)
- The basis for each check is chosen based on what’s actually knowable for this data
- A human plausibility review has been done at least once (even if not automated)
If any scope level has zero checks, there’s a gap — and that’s where the bug will hide.
Blank Template — Fill In Per Model
Copy this template for each new model. Fill in the cells with your specific constraints.
Model name: _______________
Source table(s): _______________
Target table: _______________
What this model does: _______________
System of Record:
| Source Table | System of Record | Confidence | Notes |
|---|---|---|---|
| table | system | High/Medium/Low | competing sources? merge rules? |
| Absolute | Relative: Source | Relative: Production | Relative: Recon | Temporal | Human | |
|---|---|---|---|---|---|---|
| Column | Uniqueness? Null rules? Accepted values? Range bounds? | Do column types/values match upstream schema? | Do column definitions match prod data dictionary? | — | Is the distribution stable across runs? | Spot-check top values. Do they look right? |
| Row | What cross-column rules must every row satisfy? Any conditional logic by source? | Does each row exist in the upstream source? Does the row shape match source expectations? | Do calculated fields use the same formula as prod? | Do row-level amounts tie to external records? | What shouldn’t change row-by-row between runs? | Pull 5-10 random rows. Do they look right? |
| Aggregate | What accounting identities or bounds must hold? | Do row counts match upstream? Any IDs in target not in source? | Do totals tie to finance/reporting dashboards? | Do totals tie to external system exports (Stripe, billing, bank)? | What MoM/QoQ patterns should hold? What swing threshold triggers a flag? | Show the key metric to a stakeholder. Do they flinch? |
Coverage check:
- Column-level: at least one check
- Row-level: at least one check
- Aggregate-level: at least one check
- Relative: source: at least one check
- Human review: done at least once
SQL Patterns Reference
Column-level absolute
-- Uniqueness
SELECT opportunity_id, count(*) FROM model
GROUP BY 1 HAVING count(*) > 1;
-- Should return 0 rows
-- No nulls in key fields
SELECT count(*) FROM model WHERE opportunity_id IS NULL;
-- Should return 0
-- Accepted values
SELECT DISTINCT stage FROM model
WHERE stage NOT IN ('Prospecting', 'Negotiation', 'Closed Won', 'Closed Lost');
-- Should return 0 rows
-- Range bounds
SELECT count(*) FROM model WHERE amount <= 0;
-- Should return 0 for closed-won
Row-level absolute (cross-column rules)
-- Conditional rule: new opportunities must have positive ACV
SELECT count(*) FROM model
WHERE type = 'New Business' AND acv <= 0;
-- Should return 0
-- Cross-column coherence
SELECT count(*) FROM model WHERE start_date >= end_date;
-- Should return 0
-- Source-specific row shape
SELECT count(*) FROM model
WHERE source_system = 'salesforce' AND sf_id IS NULL;
-- Should return 0
Aggregate-level absolute (accounting identity)
-- Components must sum to whole
SELECT
starting_arr + new_arr + expansion_arr - contraction_arr - churned_arr AS calculated_ending,
ending_arr AS reported_ending
FROM arr_waterfall
WHERE calculated_ending != reported_ending;
-- Should return 0 rows
Relative: source (completeness + precision + fidelity)
-- Completeness: right rows came through
SELECT count(*) AS expected FROM bronze WHERE stage = 'Closed Won';
SELECT count(*) AS actual FROM silver_closed_won;
-- Must match exactly
-- Precision: only right rows came through
SELECT opportunity_id FROM silver_closed_won
EXCEPT
SELECT opportunity_id FROM bronze WHERE stage = 'Closed Won';
-- Should return 0 rows
-- Fidelity: values survived intact
SELECT count(*) FROM bronze b
JOIN silver_closed_won s ON b.opportunity_id = s.opportunity_id
WHERE b.amount != s.amount OR b.close_date != s.close_date;
-- Should return 0
Relative: reconciliation
-- Aggregate recon: reconcile to Stripe
SELECT
(SELECT ending_arr FROM silver_arr_waterfall
ORDER BY month DESC LIMIT 1) AS waterfall_arr,
(SELECT SUM(mrr) * 12 FROM stripe_mrr_snapshot
WHERE snapshot_date = CURRENT_DATE) AS stripe_arr;
-- Delta should be < 5%
Temporal (change detection with lookback)
-- Column temporal: null rate stability
WITH run_stats AS (
SELECT run_date,
COUNT(*) AS total,
SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) AS nulls,
ROUND(100.0 * SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) / COUNT(*), 2) AS null_pct
FROM model_with_run_tracking
GROUP BY 1
)
SELECT * FROM run_stats
WHERE null_pct > 5;
-- Flag if null rate exceeds threshold
-- Aggregate temporal: MoM swing check
WITH monthly AS (
SELECT DATE_TRUNC('month', close_date) AS month, count(*) AS cnt
FROM silver_closed_won
GROUP BY 1
)
SELECT
month,
cnt,
LAG(cnt) OVER (ORDER BY month) AS prev_cnt,
ROUND(100.0 * (cnt - LAG(cnt) OVER (ORDER BY month)) / NULLIF(LAG(cnt) OVER (ORDER BY month), 0), 1) AS pct_change
FROM monthly
WHERE ABS(pct_change) > 30;
-- Flag months with >30% swing for review
Populated Example 2 — ARR Waterfall
Model name: arr_waterfall
Source table(s): bronze_subscriptions, bronze_invoices, bronze_cancellations
Target table: silver_arr_waterfall
What this model does: Aggregates subscription events into monthly ARR cohorts — Starting ARR + New + Expansion - Contraction - Churn = Ending ARR. Each row is one month.
| Absolute | Relative: Source | Relative: Production | Relative: Recon | Temporal | Human | |
|---|---|---|---|---|---|---|
| Column | No nulls in month, ending_arr. All ARR columns ≥ 0. month is a valid first-of-month date. | Column types match bronze schema. month values exist in the source date spine. | Column names match production dashboard definitions. | — | Distribution of ARR values stable across runs. No sudden zero-months appearing. | Spot-check: are the ARR magnitudes reasonable for this company’s stage? |
| Row | ending_arr = starting_arr + new_arr + expansion_arr - contraction_arr - churned_arr. churned_arr ≤ starting_arr per row. | Each subscription in new_arr has a matching created event in bronze. Each churned_arr entry has a matching cancellation. | Calculated fields use same categorization logic as prod reporting. | Row-level subscription amounts tie to invoice line items. | No single month’s ending_arr retroactively changed since last run (unless restatement flagged). | Pull 3 recent months. Does the ratio of new vs churn feel right for the business stage? |
| Aggregate | Total ending ARR is monotonically explainable. Sum of all components = ending for every month. | Every active subscription in bronze appears in the waterfall exactly once per month. No phantom subscriptions. | Sum of all new_arr ≈ total subscription revenue in finance ledger ± tolerance. | Ending ARR for latest month ≈ Stripe MRR × 12 ± 5%. | QoQ growth rate stays within ±20% unless a known event explains the swing. Q1 dip is normal for B2B. | Show the waterfall chart to the founder. Does the shape match the narrative? |
ARR Waterfall — SQL Patterns
-- Column-level absolute: no nulls, no negatives
SELECT count(*) FROM silver_arr_waterfall WHERE month IS NULL OR ending_arr IS NULL;
SELECT count(*) FROM silver_arr_waterfall WHERE new_arr < 0 OR expansion_arr < 0
OR contraction_arr < 0 OR churned_arr < 0;
-- Both should return 0
-- Row-level absolute: accounting identity per row
SELECT month FROM silver_arr_waterfall
WHERE ending_arr != starting_arr + new_arr + expansion_arr
- contraction_arr - churned_arr;
-- Should return 0 rows
-- Row-level absolute: churn cannot exceed starting (per row)
SELECT month FROM silver_arr_waterfall
WHERE churned_arr > starting_arr;
-- Should return 0 rows
-- Relative: source — completeness
SELECT id FROM bronze_subscriptions
WHERE status = 'active'
AND id NOT IN (SELECT subscription_id FROM silver_arr_waterfall);
-- Should return 0 rows
-- Relative: recon — reconcile to Stripe
SELECT
(SELECT ending_arr FROM silver_arr_waterfall
ORDER BY month DESC LIMIT 1) AS waterfall_arr,
(SELECT SUM(mrr) * 12 FROM stripe_mrr_snapshot
WHERE snapshot_date = CURRENT_DATE) AS stripe_arr;
-- Delta should be < 5%
-- Temporal: QoQ swing check
WITH quarterly AS (
SELECT DATE_TRUNC('quarter', month) AS qtr,
SUM(ending_arr) / COUNT(*) AS avg_ending_arr
FROM silver_arr_waterfall
GROUP BY 1
)
SELECT qtr,
avg_ending_arr,
LAG(avg_ending_arr) OVER (ORDER BY qtr) AS prev,
ROUND(100.0 * (avg_ending_arr - LAG(avg_ending_arr) OVER (ORDER BY qtr))
/ NULLIF(LAG(avg_ending_arr) OVER (ORDER BY qtr), 0), 1) AS pct_change
FROM quarterly
WHERE ABS(pct_change) > 20;
-- Flag quarters with >20% swing
Coverage check:
- Column-level: no nulls + no negatives
- Row-level: accounting identity + churn bound
- Aggregate-level: completeness + Stripe reconciliation
- Relative: source: subscription existence check
- Human review: founder waterfall chart review
Related
- ../../06-reference/2026-03-30-founder-data-quality-framework — the original framework conversation
- ../../06-reference/2026-04-07-seattle-data-guy-noisy-data-quality-checks — SDG’s complementary “fewer better checks” thesis
- ../../06-reference/2026-04-11-garry-tan-thin-harness-fat-skills — Tan’s latent vs deterministic maps onto this: absolute checks are deterministic, human checks are latent