04-tooling / xmr-charts

mrr bridge and annotation layer

Tue Apr 21 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·tool-doc ·status: v1
xmrspcmrrinput-metricsannotation-layermac-frameworkdecision-contextwheelerchin

XmR for SaaS: The MRR Bridge Decomposition + The Analyst Annotation Layer

The one-sentence claim

XmR’s value as a monitoring tool is in two halves — choosing the right input metric to chart (not the output it eventually rolls up to), and capturing the analyst’s cause attribution on every signal so the chart accumulates into a learning system rather than a museum exhibit.

Part 1 — Input metrics, not output metrics

The first instinct when an executive asks “is the business OK?” is to chart MRR. MRR is the answer. This is the wrong move and the chart at /tmp/xmr-output-mrr.png shows why. MRR is the cumulative integral of every sales, expansion, churn, and downgrade event in the period. Its noise structure is multiplicative (a $1k swing means different things at $50k and $500k MRR), its drift is structural (any growing SaaS company will trip Rule 2 every two quarters by trending), and the lag from cause to chart is one accounting close. By the time MRR’s XmR fires, the cause is six weeks in the past and one bookkeeping cycle removed from the operator who could have acted on it. The chart functions as a museum exhibit of a decision someone else already made.

Wheeler and Chin point at the same fix from different sides. Wheeler’s transformation chapter says that when a series is multiplicative or heteroskedastic the linear-additive XmR machinery is the wrong tool — log-transform first, or decompose into the additive inputs that compose it. Chin, channelling Amazon’s WBR discipline in ../../06-reference/2026-04-15-commoncog-amazon-weekly-business-review, makes the operational version of the same point: output metrics are forbidden in the WBR review until the controllable inputs that produced them have been discussed. The output is the score; the inputs are the game.

The MRR bridge decomposition is the canonical input-side breakdown:

MRR_t = MRR_{t-1} + New + Expansion + Retained_unchanged + Downgraded + Churned

Each term on the right is a separately chartable input. New MRR has clean additive noise (it’s a sum of deal sizes), Closed Deals is a count (Poisson-ish, well-behaved), Churn count is a count of identifiable customer events with known root causes, and Expansion is structurally separate from acquisition with separate causes. The chart at /tmp/xmr-input-new-mrr.png shows New MRR carrying signal weeks before that signal aggregates into output MRR. /tmp/xmr-input-closed-deals.png and /tmp/xmr-input-churn-count.png are siblings — each bounded, additive, controllable, and earlier-warning than the rolled-up output.

One layer further upstream sits the funnel:

Closed Deals = Lead count × MQL→SQL rate × SQL→Close rate

Each of those three inputs is more chartable still. Lead count is driven by marketing levers (campaign spend, content cadence, partnerships). MQL→SQL is a rate controlled by SDR process and qualification criteria. SQL→Close is a rate controlled by AE motion and pricing. When Closed Deals dips, the question is which of those three inputs moved — a question the funnel-input charts answer by inspection and the Closed Deals chart cannot.

The general rule: each upstream layer is more chartable than the layer below it, because (a) the noise gets more additive as you decompose multiplicative compositions, (b) the variance gets bounded by physical limits (you can’t generate negative leads), (c) the levers get more controllable, and (d) the lag from cause to chart shrinks with each step upstream. The right XmR layer to chart is the most-upstream layer at which the operator can pull a lever — and never the output.

Part 2 — The annotation layer (the load-bearing missing piece)

A clean XmR chart on the right input metric tells you that the system shifted. It does not tell you why. Without a captured “why,” the next signal that fires has no prior to weigh against, the cause-pattern catalog never accumulates, and every new signal investigation starts from zero. The chart becomes a museum exhibit even when the math is right — every signal is admired, none of them compound into a learning system.

The annotation layer is the missing piece. Three things it must capture, every time a signal fires:

  1. Cause attribution. What happened in the world that caused this point to land outside the limits or this run to land on one side of the central line? Pricing change, marketing campaign, competitive event, methodology change in the metric itself, data-pipeline incident, seasonality the model didn’t capture? The cause is a hypothesis when the signal first fires, becomes a confirmed cause when evidence supports it, and stays “unexplained” honestly when evidence does not.
  2. Decision context. What did the team believe at the time the signal fired, what was the state of the business when it fired, and what did the team do in response? Six months later when a similar signal fires, the prior is “last time we thought it was X, did Y, and Z happened.” Without decision context the next investigation has no precedent.
  3. Lever implication. If the confirmed cause is something the team can control — a campaign, a pricing decision, a hiring choice — what is the inverse intervention worth? Quantify the controllable lever. This is what turns a signal from a curiosity into a planning input.

BI tools and dbt resist this work because none of them are shaped like it. BI is a display surface — Tableau, Looker, Mode, Sigma render dashboards but capture no judgment beyond a comment thread nobody can query. dbt is a transformation surface — every semantic layer assumes data flows one direction, raw to modeled. Annotations flow the other way: written back against the chart by a human after investigation. The closest public tool is xmrit.com, which got the chart math right but stops at chart-tool — it does not persist the analyst write-up that gives each signal its meaning.

This gap is RDCO’s positioning surface. The MAC framework at ../../01-projects/data-quality-framework/testing-matrix-template.md tests that the model is right; the annotation layer captures why the world moved when a signal fires on top of a known-good model. Both are operator-judgment workflows that resist naive automation, and both suit a thin-harness fat-skill split better than any dashboard.

Part 3 — The annotation schema (concrete proposal)

One annotation file per signal, stored alongside the chart. The schema is YAML so it’s grep-able, diff-able in git, and trivially rendered into briefs and pattern queries. Copy this template; fill the nullable fields as investigation proceeds.

signal_id: new-mrr-2026-w14-r1                # unique stable id; metric::date::rule
metric_name: new_mrr                          # the operationally-defined input metric
chart_path: charts/new-mrr/2026-w14.png       # absolute or repo-relative chart link
signal_type: rule_1                           # rule_1 | rule_2 | rule_3 | manual
fired_at: 2026-04-08                          # date of the point that tripped the rule
investigated_by: ben                          # null until an analyst owns the signal
investigated_at: null                         # ISO date when investigation closed
attribution_hypotheses:                       # list of candidates considered, in order
  - "April pricing test (15% list increase)"
  - "Loss of inbound from partner X churn"
  - "Seasonal Q2 procurement freeze"
confirmed_cause: null                         # the single hypothesis evidence supports
cause_class: null                             # internal_process_change | external_shock
                                              # | methodology | data_quality | unexplained
evidence:                                     # links / refs that justify confirmed_cause
  - notion://task/abc-123-pricing-test-launch
  - vault://04-finance/2026-04-pulse.md#partner-x
  - slack://archive/sales-2026-04-08
lever_implication:
  controllable: null                          # bool — is the cause something we can move?
  estimated_effect: null                      # one-line description of inverse intervention
  estimated_effect_$: null                    # quantified $ impact if controllable
  confidence: null                            # low | medium | high
tags: [pricing, q2-2026, expansion-cohort]    # free-form for pattern queries
related_signals:                              # other annotations this one rhymes with
  - new-mrr-2025-w27-r1
  - expansion-2026-w12-r2

Field purposes, in one line each:

Part 4 — Connections to existing vault scaffolding

The annotation layer is not a new framework — it sits at the intersection of three existing ones in the vault.

../../06-reference/concepts/operational-definitions is the prerequisite. An annotation that says confirmed_cause: "churn spike" is meaningless without an operational definition of churn that names the criterion, test, and decision rule. Wheeler’s three-part definition is what makes the annotation reproducible across analysts and across time. The annotation layer is operational definitions applied not to the metric but to the causes of metric movement — every confirmed_cause should be traceable to a definition that survives an analyst handoff.

The Cedric Chin Masterclass PDSA tracker pattern (from ../../06-reference/2026-04-15-commoncog-becoming-data-driven-first-principles) is a sibling template: same shape, applied to active experiments. PDSA captures plan-do-study-act on a deliberate intervention. The annotation layer captures the same study-act discipline on passive monitoring — when the world moves and you didn’t do it deliberately. Together they cover the full causal surface: PDSA for what you tried, annotations for what happened to you.

The MAC framework at ../../01-projects/data-quality-framework/testing-matrix-template.md tests that the model is right. The annotation layer captures why the world moved when a signal fires on top of a model the MAC tests have certified. They are complementary not redundant: MAC is the prerequisite (a signal on a broken model is data-quality noise, not business signal), and annotations are the next layer up (a signal on a clean model is business signal, and the annotation captures what kind).

The harness thesis is the architectural fit. Annotation work is compound — each signal investigated makes the next one cheaper because the cause-pattern catalog grows. It resists naive automation because the judgment step is irreducibly human. But it suits the thin-orchestrator + fat-skill split exactly: the agent maintains the schema, prompts the analyst for missing fields, surfaces stale stubs, runs the pattern queries, and links new signals to related ones. The analyst supplies the judgment. The agent supplies the workflow.

Part 5 — Why this resists BI/dbt automation

The annotation loop has four stages and none of them are “transform a column”:

  1. Signal fires — the chart-rendering pipeline detects a Wheeler rule violation and emits a signal record.
  2. Review queued — a task is created in the analyst’s workflow surface (Notion) with signal id and chart link.
  3. Investigation — the analyst reads context, interviews, looks at correlated data, generates and prunes hypotheses.
  4. Annotation captured — the analyst writes schema fields back to the annotation file; the next chart render links the annotation to the point that triggered it.

BI assumes the workflow ends at display, dbt assumes it ends at transformation, xmrit ends at chart. None of them carry the four stages. This is exactly the operator-judgment compound workflow the harness/skills architecture was built for. Thin orchestrator detects, queues, prompts, links; fat skill is the analyst’s investigation and judgment; durable storage is the annotation file in the vault.

Part 6 — Operationalization sketch

Storage. Annotations live alongside charts in 04-tooling/xmr-charts/annotations/<metric>/<signal_id>.md. One annotation per file. Easy to grep, trivial to diff in git, links cleanly back to the chart’s per-segment data. The chart-rendering pipeline writes a stub on signal detection; the analyst fills it in.

Workflow. When xmr.py detects a signal it (a) writes the chart PNG, (b) creates a stub annotation file with investigated_by: null and confirmed_cause: null, and (c) queues a Notion task “Annotate signal <metric>::<date>” for the analyst. The Notion task carries the signal_id so the agent can close the loop when the annotation file fills in.

Pattern queries. A small script analyze-annotations.py walks the annotations directory and answers questions like “show me all churn-related root causes in the past 12 months,” “every confirmed_cause = ‘pricing change’ across all metrics,” or “every lever_implication where controllable: true and confidence: high.” Output is a markdown table the analyst can paste into a brief.

Morning brief integration. The /morning-prep skill reads stale annotation stubs (more than 14 days unannotated) and surfaces them in the DECISIONS AWAITING YOU block. A signal that fires and never gets investigated decays into noise; the morning brief is the staleness check.

Skill alongside /audit-model and /generate-tests. The third skill in the data-quality bundle is /annotate-signal <chart> <signal_id> — an interactive prompt that walks the analyst through the schema, suggests hypotheses based on related_signals, drafts evidence links from the vault index, and writes the file back. This is the operator-judgment surface the harness exposes.

Part 7 — What ships in v1 vs what’s deferred

v1 (now):

v2 (when founder approves):

v3: