Hardening Reporting Pipelines
The most complete and operationally rich document from the ConnectWise era. This is an internal manifesto arguing for where the “source of truth” should live in a growing SaaS company’s data process, and the specific engineering and governance practices needed to harden it. Written using the “Islands and Bridges” strategy — a writing technique where you identify isolated concepts (islands) and then write transitions (bridges) between them.
The Core Argument
Every month, leadership asks: What changed? What caused it? Why is it different? These are symptoms of an immature data process. Instead of treating symptoms monthly, cure the cause by investing in the transformation layer as the source of truth.
The Three-Layer Data Process
A reusable framework for any data organization:
- Sources — Where data is first recorded (Salesforce, Netsuite, Zuora, etc.)
- Transform — Where data is cleaned, standardized, and shaped (the dbt layer)
- Report — Where data is surfaced for viewing (Power BI, Excel, FP&A models)
Two transitions connect the layers (load from source to transform, push from transform to reports). The key insight: nothing special happens in transitions — they create carbon copies. The only question is “when was the last time this ran?”
Why Sources Cannot Be the Source of Truth
Two fatal weaknesses:
- Narrow focus — Each system only sees its own context. ConnectWise had 11+ billing sources (Bedrock Salesforce, Bedrock Zuora, Legacy Continuum SF, Legacy CW Manage, Legacy Great Plains, Control’s custom transactions, BrightGauge’s Chargify, ITBoost’s Stripe, HTG’s Quickbooks). No single system has the full picture.
- Immediate focus — Source systems serve operators with current-state data. They cannot answer “what did this look like last month?”
This multi-source complexity is a direct parallel to 06-reference/2026-04-03-snowflake-rapid-growth-doordash — rapid growth through acquisition creates data fragmentation.
Why Reports Cannot Be the Source of Truth
- Reports are endpoints — they cannot port logic to other reports
- Reports go stale the moment they are produced
- “Once reported, always reported” is an unreasonable constraint that kills flexibility
- Hidden transformation logic inside reports creates opacity
The Transformation Layer as Source of Truth
The transformation layer checks the boxes:
- Full context from all source systems
- Historical tracking
- Business logic portability for consistent reporting
- Up-to-date data
The problem: it is a black box. The solution is shining a light into it through dbt’s documentation, data lineage, and testing capabilities. This is the 06-reference/concepts/systems-over-goals philosophy — build a system that produces trust, rather than chasing individual report accuracy.
Four Root Causes of Data Variance
When numbers change between reporting periods, there are exactly four root causes:
- Business process changes in source systems (new commission plans, new SKUs)
- New source systems (M&A, system migrations like Project Bedrock)
- Modeling changes for reporting requirements (isolating overages, reclassifications)
- Errors in the reporting layer (hardcoded values, missed formula drag-downs)
Hardening Strategies
Change Data Capture (CDC) for Existing Sources
- Type II Slowly Changing Dimensions via dbt snapshots — the most robust and easiest to implement
- Requires: unique ID + updated_at timestamp on source tables
- For Fivetran-loaded sources, these fields are reliably available
- Key insight: start snapshotting early. The earlier you start, the more history you capture.
Mapping Tables for New Sources (M&A)
- Simple two-column tables: Legacy ID to New ID, New SKU to Reporting Category
- Technically simple, execution is difficult
- Governance and the data steering committee must review and approve mappings
- Same pattern repeats for every acquisition — build the process once
Code Reviews via Pull Requests for Business Logic Changes
- All SQL logic under version control
- Development branches materialize in separate Snowflake schemas
- Impact analysis = point reports at both production and development schemas, diff the results
- “No one should approve their own pull request” as a collaboration forcing function
Data Tests for Reporting Requirements
- Codify field definitions, acceptable categories, allowed value ranges
- Run tests on schedule to catch issues before they reach stakeholders
- Unwritten expectations must be made explicit
Data Services SLA (Proposed)
A reusable template for analytics team boundaries:
- Scope: Loading data, transforming per business requirements, pushing to reports
- Not in scope: Additional analysis on top of the reporting layer; undocumented data assets
- Incident response: 1 business day for response, 3 business days for fix
- Logic changes: Must go through Data Steering Committee with impact analysis for corporate metrics
This SLA framework is directly applicable to 06-reference/concepts/analytics-as-craft — defining the boundaries of the practice.
The Decentralized Analytics Vision
The long-term goal: embedded analysts in Finance, Partner Success, Sales, Marketing, and Product all contribute to the centralized transformation layer. “If Product’s work can be leveraged by Partner Success, the organization is getting more out of the same amount of work. Data becomes a point of collaboration, instead of silos.”
This is working ON the business (06-reference/2026-04-03-the-e-myth-revisited) — building a system where analytics compounds rather than stays siloed.
Team and Budget Reality
The document includes a raw, honest assessment of team strain:
- Data Services started 2020 with 9 people, ended with 5 (3 voluntary departures, 1 internal transfer)
- Total tooling budget: ~$1,600/year (1 dbt Cloud license + 5 DataGrip licenses + free GitHub)
- The team supported board decks, Gainsight integrations, renewal reporting, and Project Bedrock
- “This team accomplished so much with so little” — a pattern familiar in data teams everywhere
Consulting Credibility
This document is a complete demonstration of 01-projects/phdata/career-transition capabilities: scoping data problems, proposing governance frameworks, defining SLAs, building team processes, and communicating technical architecture to business stakeholders. The frameworks here (three-layer process, four root causes of variance, CDC strategy, SLA template) are directly deployable in consulting engagements.
Modern Data Stack (2020 vintage)
The full stack documented in this era:
- Fivetran (loading) / Snowflake (storage) / dbt (transformation) / dbt Cloud (scheduling, docs, testing)
- DataGrip (SQL authoring) / VS Code (dbt development) / Git + GitHub (version control)
- Power BI (dashboards) / Power BI Gateway (refresh) / Excel + ODBC (end-user analysis)