Hardening Reporting Pipelines

The most complete and operationally rich document from the ConnectWise era. This is an internal manifesto arguing for where the “source of truth” should live in a growing SaaS company’s data process, and the specific engineering and governance practices needed to harden it. Written using the “Islands and Bridges” strategy — a writing technique where you identify isolated concepts (islands) and then write transitions (bridges) between them.

The Core Argument

Every month, leadership asks: What changed? What caused it? Why is it different? These are symptoms of an immature data process. Instead of treating symptoms monthly, cure the cause by investing in the transformation layer as the source of truth.

The Three-Layer Data Process

A reusable framework for any data organization:

Sources — Where data is first recorded (Salesforce, Netsuite, Zuora, etc.)
Transform — Where data is cleaned, standardized, and shaped (the dbt layer)
Report — Where data is surfaced for viewing (Power BI, Excel, FP&A models)

Two transitions connect the layers (load from source to transform, push from transform to reports). The key insight: nothing special happens in transitions — they create carbon copies. The only question is “when was the last time this ran?”

Why Sources Cannot Be the Source of Truth

Two fatal weaknesses:

Narrow focus — Each system only sees its own context. ConnectWise had 11+ billing sources (Bedrock Salesforce, Bedrock Zuora, Legacy Continuum SF, Legacy CW Manage, Legacy Great Plains, Control’s custom transactions, BrightGauge’s Chargify, ITBoost’s Stripe, HTG’s Quickbooks). No single system has the full picture.
Immediate focus — Source systems serve operators with current-state data. They cannot answer “what did this look like last month?”

This multi-source complexity is a direct parallel to 06-reference/2026-04-03-snowflake-rapid-growth-doordash — rapid growth through acquisition creates data fragmentation.

Why Reports Cannot Be the Source of Truth

Reports are endpoints — they cannot port logic to other reports
Reports go stale the moment they are produced
“Once reported, always reported” is an unreasonable constraint that kills flexibility
Hidden transformation logic inside reports creates opacity

The Transformation Layer as Source of Truth

The transformation layer checks the boxes:

Full context from all source systems
Historical tracking
Business logic portability for consistent reporting
Up-to-date data

The problem: it is a black box. The solution is shining a light into it through dbt’s documentation, data lineage, and testing capabilities. This is the 06-reference/concepts/systems-over-goals philosophy — build a system that produces trust, rather than chasing individual report accuracy.

Four Root Causes of Data Variance

When numbers change between reporting periods, there are exactly four root causes:

Business process changes in source systems (new commission plans, new SKUs)
New source systems (M&A, system migrations like Project Bedrock)
Modeling changes for reporting requirements (isolating overages, reclassifications)
Errors in the reporting layer (hardcoded values, missed formula drag-downs)

Hardening Strategies

Change Data Capture (CDC) for Existing Sources

Type II Slowly Changing Dimensions via dbt snapshots — the most robust and easiest to implement
Requires: unique ID + updated_at timestamp on source tables
For Fivetran-loaded sources, these fields are reliably available
Key insight: start snapshotting early. The earlier you start, the more history you capture.

Mapping Tables for New Sources (M&A)

Simple two-column tables: Legacy ID to New ID, New SKU to Reporting Category
Technically simple, execution is difficult
Governance and the data steering committee must review and approve mappings
Same pattern repeats for every acquisition — build the process once

Code Reviews via Pull Requests for Business Logic Changes

All SQL logic under version control
Development branches materialize in separate Snowflake schemas
Impact analysis = point reports at both production and development schemas, diff the results
“No one should approve their own pull request” as a collaboration forcing function

Data Tests for Reporting Requirements

Codify field definitions, acceptable categories, allowed value ranges
Run tests on schedule to catch issues before they reach stakeholders
Unwritten expectations must be made explicit

Data Services SLA (Proposed)

A reusable template for analytics team boundaries:

Scope: Loading data, transforming per business requirements, pushing to reports
Not in scope: Additional analysis on top of the reporting layer; undocumented data assets
Incident response: 1 business day for response, 3 business days for fix
Logic changes: Must go through Data Steering Committee with impact analysis for corporate metrics

This SLA framework is directly applicable to 06-reference/concepts/analytics-as-craft — defining the boundaries of the practice.

The Decentralized Analytics Vision

The long-term goal: embedded analysts in Finance, Partner Success, Sales, Marketing, and Product all contribute to the centralized transformation layer. “If Product’s work can be leveraged by Partner Success, the organization is getting more out of the same amount of work. Data becomes a point of collaboration, instead of silos.”

This is working ON the business (06-reference/2026-04-03-the-e-myth-revisited) — building a system where analytics compounds rather than stays siloed.

Team and Budget Reality

The document includes a raw, honest assessment of team strain:

Data Services started 2020 with 9 people, ended with 5 (3 voluntary departures, 1 internal transfer)
Total tooling budget: ~$1,600/year (1 dbt Cloud license + 5 DataGrip licenses + free GitHub)
The team supported board decks, Gainsight integrations, renewal reporting, and Project Bedrock
“This team accomplished so much with so little” — a pattern familiar in data teams everywhere

Consulting Credibility

This document is a complete demonstration of 01-projects/phdata/career-transition capabilities: scoping data problems, proposing governance frameworks, defining SLAs, building team processes, and communicating technical architecture to business stakeholders. The frameworks here (three-layer process, four root causes of variance, CDC strategy, SLA template) are directly deployable in consulting engagements.

Modern Data Stack (2020 vintage)

The full stack documented in this era:

Fivetran (loading) / Snowflake (storage) / dbt (transformation) / dbt Cloud (scheduling, docs, testing)
DataGrip (SQL authoring) / VS Code (dbt development) / Git + GitHub (version control)
Power BI (dashboards) / Power BI Gateway (refresh) / Excel + ODBC (end-user analysis)