Data Quality Testing Matrix — Scope × Basis

Every meaningful data test lives at the intersection of one scope and one basis. The definition of done for any new model is coverage across the scope dimension — at least one check at each scope level.

The Two Dimensions

Axis 1 — Scope (where are you evaluating?):

Column-level — one column in isolation. Uniqueness, null rate, value distribution, accepted values, logical range. The unit is a single field — no cross-column awareness needed.
Row-level — cross-column rules within a single record. Conditional logic, calculated field validation, source-specific row shape. “Gross amount ≥ sale amount” requires comparing two fields. “New opportunities can’t have negative ACV” is a conditional rule.
Aggregate-level — totals, counts, and ratios across the full dataset. Individual rows can all pass their checks and the totals can still be wrong. Joins that fan out, filters that silently drop records, window functions that double-count across partitions. This is where most real analytics bugs live.

Axis 2 — Basis (what are you evaluating against?):

Absolute — does this satisfy a fixed rule? No external reference needed. Accounting identities, bounding rules, type constraints.
Relative: Source — does this match the upstream data? Completeness (right rows came through), precision (only right rows), fidelity (values survived intact). This is the transformation validation — comparing downstream to upstream.
Relative: Production — does this match existing reporting or the finance ledger? Cross-system reconciliation against the current state of truth.
Relative: Reconciliation — does this match an external reconciliation file or third-party data? Ties to Stripe, bank statements, vendor exports.
Temporal — does this match history or expectations? Current period shouldn’t retroactively change past periods. MoM swings beyond threshold flag for review. Seasonality patterns that break norms.
Human Sanity Check — show the number to someone who knows the business and see if they flinch. Not automatable but belongs in the framework explicitly, because otherwise teams skip it when they think “we have tests now.”

Prerequisites

Before filling in the matrix, complete these steps:

System of Record (SoR) Identification

For each source table feeding this model, identify the authoritative system:

Source Table	System of Record	Confidence	Notes
e.g. opportunities	Salesforce	High — single source	No competing systems
e.g. customers	Salesforce + HubSpot	Medium — two sources, need merge rules	SF is primary, HS fills gaps

Why this matters: You can’t write Relative: Source checks without knowing WHICH source is authoritative. If there are competing sources for a conformed dimension, that’s a severity 1 issue (Stop) — resolve it before writing tests.

Severity Tiers

Each check in the matrix should be assigned a severity that determines the pipeline response:

Severity	Response	When to use
Stop	Block the pipeline. Data cannot be served.	Accounting identities violated, missing conformed dimensions, referential integrity broken, wrong system of record
Pause	Flag for review before serving. Pipeline continues but output is quarantined.	Missing mandatory values, row count swings beyond threshold, aggregate reconciliation outside tolerance
Go	Log the finding, continue serving. Review in next cycle.	Missing optional attributes, minor distribution shifts, cosmetic data issues

Default severity mapping by basis:

Absolute → Stop (hard constraints that should never fail)
Relative: Source → Stop (transformation broke something)
Relative: Production → Pause (might be a real change or a bug — needs human judgment)
Relative: Recon → Pause (external system may have different timing)
Temporal → Pause (anomaly, not necessarily wrong)
Human → Go (informational, not blocking)

Layer-Aware Severity

The medallion architecture changes what “bad” means at each layer. The same issue has different severity depending on where it occurs:

Issue	Bronze	Silver	Gold
Null in key field	Go (source sends it, log and move on)	Pause (transformation should have handled it — possible bug)	Stop (consumer-facing, blocks the report)
Row count mismatch vs upstream	Pause (ingestion may have filtered)	Stop (star schema must account for all source rows)	Pause (gold may intentionally aggregate/filter)
Value outside expected range	Go (source data quality issue, flag for SoR team)	Stop (silver should have cleansed or rejected)	Stop (invalid data reached the consumer)
Schema change detected	Stop (bronze contract with source broke)	Pause (may need model update)	Go (gold schema is independently defined)
Duplicate records	Pause (source may send dupes, dedup in silver)	Stop (star schema must not have dupes)	Stop (report double-counts)

The principle: Bronze is permissive (mirror what the source gives you). Silver is strict (the transformation layer is where quality is enforced). Gold is zero-tolerance for consumer-facing issues but permissive on structure (intentional reshaping is expected).

How this maps to dbt:

Bronze tests: severity: warn for most checks (log, don’t block)
Silver tests: severity: error for key integrity and transformation fidelity
Gold tests: severity: error for anything consumer-facing; severity: warn for structural checks

Override these defaults based on business impact. A null in an optional description field is a Go even though it’s an Absolute check. A temporal swing of 200% in revenue is a Stop even though temporal checks default to Pause.

The Matrix (with Salesforce Closed-Won Example)

	Absolute	Relative: Source	Relative: Production	Relative: Recon	Temporal	Human
Column	`opportunity_id` not null, unique. `amount` > 0. `stage` in accepted values list. `close_date` is a valid date.	`stage` values in silver exist in bronze enum. Column types match source schema.	Column definitions match production data dictionary.	—	Null rate for `amount` stable across runs (not suddenly spiking). Distribution of `stage` values consistent.	Spot-check: do the top 10 amounts look reasonable for this business?
Row	`start_date < end_date`. “New” opportunities must have positive ACV. Rows from Salesforce must have `sf_id` populated; rows from HubSpot must have `hs_id`.	Each `opportunity_id` in silver exists in bronze with matching `stage = 'Closed Won'`. Row shape matches source expectations (SF rows have X,Y,Z; HS rows have A,B,C).	Calculated fields (win rate, ACV) use same formula as production dashboards.	Row-level amounts tie to invoice line items in billing system.	No single contract’s `close_date` changed since last run (unless restatement flagged).	Pull 5-10 random closed-won records. Does the amount look right? Is the close date plausible?
Aggregate	Churned ARR ≤ starting ARR. Win rate between 0% and 100%. Total closed-won count ≥ 0.	Silver closed-won count = bronze count WHERE `stage = 'Closed Won'`. No IDs in silver that aren’t closed-won in bronze.	Salesforce ARR ties to finance ledger ± tolerance. Pipeline totals match executive dashboard.	Ending ARR ≈ Stripe MRR × 12 ± 5%. Total invoiced matches billing system export.	MoM closed-won count doesn’t swing >30% without a flag. Q4 should be higher than Q1 (seasonality).	Show the pipeline waterfall to a sales leader. Does the shape match their mental model?

How to Read the Matrix

Rows (Scope) answer: where are you evaluating?

Column: Is each field valid on its own? (isolation)
Row: Is each record internally coherent? (cross-column)
Aggregate: Are the totals correct? (dataset-wide)

Columns (Basis) answer: what are you evaluating against?

Absolute: A fixed logical rule (self-contained, no external reference)
Relative: Source: The upstream data layer (transformation validation)
Relative: Production: Existing reporting or dashboards (reconciliation to truth)
Relative: Recon: External third-party data (billing, bank, vendor)
Temporal: Historical baseline or expected pattern (change detection)
Human: Someone who knows the business looks at it and doesn’t flinch

The “Definition of Done” for Any New Model

Before shipping a new dbt model or Snowflake view, confirm:

At least one column-level check exists (null, unique, range, accepted values)
At least one row-level check exists (cross-column rules, conditional logic)
At least one aggregate-level check exists (accounting identities, reconciliation)
At least one relative: source check exists (upstream comparison — completeness, precision, fidelity)
The basis for each check is chosen based on what’s actually knowable for this data
A human plausibility review has been done at least once (even if not automated)

If any scope level has zero checks, there’s a gap — and that’s where the bug will hide.

Blank Template — Fill In Per Model

Copy this template for each new model. Fill in the cells with your specific constraints.

Model name: _______________
Source table(s): _______________
Target table: _______________
What this model does: _______________

System of Record:

Source Table	System of Record	Confidence	Notes
table	system	High/Medium/Low	competing sources? merge rules?

	Absolute	Relative: Source	Relative: Production	Relative: Recon	Temporal	Human
Column	Uniqueness? Null rules? Accepted values? Range bounds?	Do column types/values match upstream schema?	Do column definitions match prod data dictionary?	—	Is the distribution stable across runs?	Spot-check top values. Do they look right?
Row	What cross-column rules must every row satisfy? Any conditional logic by source?	Does each row exist in the upstream source? Does the row shape match source expectations?	Do calculated fields use the same formula as prod?	Do row-level amounts tie to external records?	What shouldn’t change row-by-row between runs?	Pull 5-10 random rows. Do they look right?
Aggregate	What accounting identities or bounds must hold?	Do row counts match upstream? Any IDs in target not in source?	Do totals tie to finance/reporting dashboards?	Do totals tie to external system exports (Stripe, billing, bank)?	What MoM/QoQ patterns should hold? What swing threshold triggers a flag?	Show the key metric to a stakeholder. Do they flinch?

Coverage check:

Column-level: at least one check
Row-level: at least one check
Aggregate-level: at least one check
Relative: source: at least one check
Human review: done at least once

SQL Patterns Reference

Column-level absolute

-- Uniqueness
SELECT opportunity_id, count(*) FROM model
GROUP BY 1 HAVING count(*) > 1;
-- Should return 0 rows

-- No nulls in key fields
SELECT count(*) FROM model WHERE opportunity_id IS NULL;
-- Should return 0

-- Accepted values
SELECT DISTINCT stage FROM model
WHERE stage NOT IN ('Prospecting', 'Negotiation', 'Closed Won', 'Closed Lost');
-- Should return 0 rows

-- Range bounds
SELECT count(*) FROM model WHERE amount <= 0;
-- Should return 0 for closed-won

Row-level absolute (cross-column rules)

-- Conditional rule: new opportunities must have positive ACV
SELECT count(*) FROM model
WHERE type = 'New Business' AND acv <= 0;
-- Should return 0

-- Cross-column coherence
SELECT count(*) FROM model WHERE start_date >= end_date;
-- Should return 0

-- Source-specific row shape
SELECT count(*) FROM model
WHERE source_system = 'salesforce' AND sf_id IS NULL;
-- Should return 0

Aggregate-level absolute (accounting identity)

-- Components must sum to whole
SELECT 
  starting_arr + new_arr + expansion_arr - contraction_arr - churned_arr AS calculated_ending,
  ending_arr AS reported_ending
FROM arr_waterfall
WHERE calculated_ending != reported_ending;
-- Should return 0 rows

Relative: source (completeness + precision + fidelity)

-- Completeness: right rows came through
SELECT count(*) AS expected FROM bronze WHERE stage = 'Closed Won';
SELECT count(*) AS actual FROM silver_closed_won;
-- Must match exactly

-- Precision: only right rows came through
SELECT opportunity_id FROM silver_closed_won
EXCEPT
SELECT opportunity_id FROM bronze WHERE stage = 'Closed Won';
-- Should return 0 rows

-- Fidelity: values survived intact
SELECT count(*) FROM bronze b
JOIN silver_closed_won s ON b.opportunity_id = s.opportunity_id
WHERE b.amount != s.amount OR b.close_date != s.close_date;
-- Should return 0

Relative: reconciliation

-- Aggregate recon: reconcile to Stripe
SELECT
  (SELECT ending_arr FROM silver_arr_waterfall
   ORDER BY month DESC LIMIT 1) AS waterfall_arr,
  (SELECT SUM(mrr) * 12 FROM stripe_mrr_snapshot
   WHERE snapshot_date = CURRENT_DATE) AS stripe_arr;
-- Delta should be < 5%

Temporal (change detection with lookback)

-- Column temporal: null rate stability
WITH run_stats AS (
  SELECT run_date,
    COUNT(*) AS total,
    SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) AS nulls,
    ROUND(100.0 * SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) / COUNT(*), 2) AS null_pct
  FROM model_with_run_tracking
  GROUP BY 1
)
SELECT * FROM run_stats
WHERE null_pct > 5;
-- Flag if null rate exceeds threshold

-- Aggregate temporal: MoM swing check
WITH monthly AS (
  SELECT DATE_TRUNC('month', close_date) AS month, count(*) AS cnt
  FROM silver_closed_won
  GROUP BY 1
)
SELECT 
  month,
  cnt,
  LAG(cnt) OVER (ORDER BY month) AS prev_cnt,
  ROUND(100.0 * (cnt - LAG(cnt) OVER (ORDER BY month)) / NULLIF(LAG(cnt) OVER (ORDER BY month), 0), 1) AS pct_change
FROM monthly
WHERE ABS(pct_change) > 30;
-- Flag months with >30% swing for review

Populated Example 2 — ARR Waterfall

Model name: arr_waterfall
Source table(s): bronze_subscriptions, bronze_invoices, bronze_cancellations
Target table: silver_arr_waterfall
What this model does: Aggregates subscription events into monthly ARR cohorts — Starting ARR + New + Expansion - Contraction - Churn = Ending ARR. Each row is one month.

	Absolute	Relative: Source	Relative: Production	Relative: Recon	Temporal	Human
Column	No nulls in `month`, `ending_arr`. All ARR columns ≥ 0. `month` is a valid first-of-month date.	Column types match bronze schema. `month` values exist in the source date spine.	Column names match production dashboard definitions.	—	Distribution of ARR values stable across runs. No sudden zero-months appearing.	Spot-check: are the ARR magnitudes reasonable for this company’s stage?
Row	`ending_arr = starting_arr + new_arr + expansion_arr - contraction_arr - churned_arr`. `churned_arr ≤ starting_arr` per row.	Each subscription in `new_arr` has a matching `created` event in bronze. Each `churned_arr` entry has a matching cancellation.	Calculated fields use same categorization logic as prod reporting.	Row-level subscription amounts tie to invoice line items.	No single month’s `ending_arr` retroactively changed since last run (unless restatement flagged).	Pull 3 recent months. Does the ratio of new vs churn feel right for the business stage?
Aggregate	Total ending ARR is monotonically explainable. Sum of all components = ending for every month.	Every active subscription in bronze appears in the waterfall exactly once per month. No phantom subscriptions.	Sum of all `new_arr` ≈ total subscription revenue in finance ledger ± tolerance.	Ending ARR for latest month ≈ Stripe MRR × 12 ± 5%.	QoQ growth rate stays within ±20% unless a known event explains the swing. Q1 dip is normal for B2B.	Show the waterfall chart to the founder. Does the shape match the narrative?

ARR Waterfall — SQL Patterns

-- Column-level absolute: no nulls, no negatives
SELECT count(*) FROM silver_arr_waterfall WHERE month IS NULL OR ending_arr IS NULL;
SELECT count(*) FROM silver_arr_waterfall WHERE new_arr < 0 OR expansion_arr < 0
   OR contraction_arr < 0 OR churned_arr < 0;
-- Both should return 0

-- Row-level absolute: accounting identity per row
SELECT month FROM silver_arr_waterfall
WHERE ending_arr != starting_arr + new_arr + expansion_arr
                    - contraction_arr - churned_arr;
-- Should return 0 rows

-- Row-level absolute: churn cannot exceed starting (per row)
SELECT month FROM silver_arr_waterfall
WHERE churned_arr > starting_arr;
-- Should return 0 rows

-- Relative: source — completeness
SELECT id FROM bronze_subscriptions
WHERE status = 'active'
  AND id NOT IN (SELECT subscription_id FROM silver_arr_waterfall);
-- Should return 0 rows

-- Relative: recon — reconcile to Stripe
SELECT
  (SELECT ending_arr FROM silver_arr_waterfall
   ORDER BY month DESC LIMIT 1) AS waterfall_arr,
  (SELECT SUM(mrr) * 12 FROM stripe_mrr_snapshot
   WHERE snapshot_date = CURRENT_DATE) AS stripe_arr;
-- Delta should be < 5%

-- Temporal: QoQ swing check
WITH quarterly AS (
  SELECT DATE_TRUNC('quarter', month) AS qtr,
         SUM(ending_arr) / COUNT(*) AS avg_ending_arr
  FROM silver_arr_waterfall
  GROUP BY 1
)
SELECT qtr,
  avg_ending_arr,
  LAG(avg_ending_arr) OVER (ORDER BY qtr) AS prev,
  ROUND(100.0 * (avg_ending_arr - LAG(avg_ending_arr) OVER (ORDER BY qtr))
    / NULLIF(LAG(avg_ending_arr) OVER (ORDER BY qtr), 0), 1) AS pct_change
FROM quarterly
WHERE ABS(pct_change) > 20;
-- Flag quarters with >20% swing