06-reference

cw hardening reporting pipelines

Thu Apr 02 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·case-study ·source: notion ·by Mr. Ben / ConnectWise era
analytics-engineeringdata-governancedbtsource-of-truthchange-data-capture

Hardening Reporting Pipelines

The most complete and operationally rich document from the ConnectWise era. This is an internal manifesto arguing for where the “source of truth” should live in a growing SaaS company’s data process, and the specific engineering and governance practices needed to harden it. Written using the “Islands and Bridges” strategy — a writing technique where you identify isolated concepts (islands) and then write transitions (bridges) between them.

The Core Argument

Every month, leadership asks: What changed? What caused it? Why is it different? These are symptoms of an immature data process. Instead of treating symptoms monthly, cure the cause by investing in the transformation layer as the source of truth.

The Three-Layer Data Process

A reusable framework for any data organization:

  1. Sources — Where data is first recorded (Salesforce, Netsuite, Zuora, etc.)
  2. Transform — Where data is cleaned, standardized, and shaped (the dbt layer)
  3. Report — Where data is surfaced for viewing (Power BI, Excel, FP&A models)

Two transitions connect the layers (load from source to transform, push from transform to reports). The key insight: nothing special happens in transitions — they create carbon copies. The only question is “when was the last time this ran?”

Why Sources Cannot Be the Source of Truth

Two fatal weaknesses:

This multi-source complexity is a direct parallel to 06-reference/2026-04-03-snowflake-rapid-growth-doordash — rapid growth through acquisition creates data fragmentation.

Why Reports Cannot Be the Source of Truth

The Transformation Layer as Source of Truth

The transformation layer checks the boxes:

The problem: it is a black box. The solution is shining a light into it through dbt’s documentation, data lineage, and testing capabilities. This is the 06-reference/concepts/systems-over-goals philosophy — build a system that produces trust, rather than chasing individual report accuracy.

Four Root Causes of Data Variance

When numbers change between reporting periods, there are exactly four root causes:

  1. Business process changes in source systems (new commission plans, new SKUs)
  2. New source systems (M&A, system migrations like Project Bedrock)
  3. Modeling changes for reporting requirements (isolating overages, reclassifications)
  4. Errors in the reporting layer (hardcoded values, missed formula drag-downs)

Hardening Strategies

Change Data Capture (CDC) for Existing Sources

Mapping Tables for New Sources (M&A)

Code Reviews via Pull Requests for Business Logic Changes

Data Tests for Reporting Requirements

Data Services SLA (Proposed)

A reusable template for analytics team boundaries:

This SLA framework is directly applicable to 06-reference/concepts/analytics-as-craft — defining the boundaries of the practice.

The Decentralized Analytics Vision

The long-term goal: embedded analysts in Finance, Partner Success, Sales, Marketing, and Product all contribute to the centralized transformation layer. “If Product’s work can be leveraged by Partner Success, the organization is getting more out of the same amount of work. Data becomes a point of collaboration, instead of silos.”

This is working ON the business (06-reference/2026-04-03-the-e-myth-revisited) — building a system where analytics compounds rather than stays siloed.

Team and Budget Reality

The document includes a raw, honest assessment of team strain:

Consulting Credibility

This document is a complete demonstration of 01-projects/phdata/career-transition capabilities: scoping data problems, proposing governance frameworks, defining SLAs, building team processes, and communicating technical architecture to business stakeholders. The frameworks here (three-layer process, four root causes of variance, CDC strategy, SLA template) are directly deployable in consulting engagements.

Modern Data Stack (2020 vintage)

The full stack documented in this era: