DBT Onboarding Checklist
A technical onboarding guide and curated resource collection for the ConnectWise Data Services team. Beyond the setup checklist, this document captures deeply practical dbt best practices, testing strategies, project structure recommendations, and tool evaluation notes drawn from the dbt community circa 2020. Many of these patterns remain canonical.
Getting Started Checklist
Software Stack
- VS Code with extensions (dbt formatter, Better Jinja, Rainbow CSV, YAML)
- DataGrip (SQL client, JetBrains license)
- GitHub account on CW org
- Windows Subsystem for Linux (WSL) with Oh My ZSH
- Python virtual environment for dbt CLI
- dbt profiles.yml configured for Snowflake
dbt Profile Template
default:
target: dev
outputs:
dev:
type: snowflake
account: continuum.us-east-1
user: bwilson
role: BA_ADM_RL
database: BA_SNDB
warehouse: QUERY_WH
schema: dbt_baw
threads: 1
client_session_keep_alive: False
Testing Strategy
A structured approach to when and what to test:
When to Test
- During development — Source uniqueness/not_null, staging join duplication, dim/fact relationship and recency tests
- During pull requests — Ensure
dbt testruns as part of CI - In production — Periodic test suite runs to catch source system data entry errors
What to Document
- Sources — How loaded, known caveats, poorly named field definitions
- Final models — Grain, column definitions (especially business logic columns)
- Intermediate models — As needed
This testing philosophy connects to 06-reference/2026-04-03-cw-hardening-reporting-pipelines where data tests are positioned as the final guard against erroneous reports.
Four Data Modeling Activities
A clean taxonomy (attributed to dbt community discourse on dbt vs. Looker):
- Cleansing — Fix problems or inconsistencies (dbt sweet spot)
- Re-shaping/pre-aggregating — Reliability, convenience, performance (dbt sweet spot)
- Creating metadata — Structures and relationships (shared between dbt and BI; leave final metric calculations for the report to preserve interactivity)
- Applying algorithms — Classification or prediction (ML layer, not dbt)
dbt Project Design: Three Interfaces
- The Graph — Data lineage (WHERE does data come from?)
- The Code — Good structure enables collaboration and onboarding
- The Warehouse — Naming and organization of the materialized data
Seven Prescriptive Actions for dbt Projects
- Choose nomenclature standards and stick to them
- Public vs. Private models — Prefix internal/intermediate assets with double underscore (
__stg_billings) - Modular model files — Descriptive CTEs, cap at 6 per model, break into ephemeral models beyond that
- Macros in moderation
- Reusable assets — One table should answer multiple questions. Regularly run ad hoc queries should be refactored into models.
- Documentation — Document models and lint SQL
- Be your own best advocate — Defining models is foundational work that benefits many others
These prescriptive actions are the 06-reference/concepts/analytics-as-craft philosophy distilled into daily practice.
Key dbt Techniques
Snapshots (Change Data Capture)
- Requires unique ID +
updated_attimestamp - Alternative: field-level change detection
- Challenge at CW: Corporate Billings and Bookings tables lacked unique IDs
- Community pattern: version numbering on top of snapshots
Blue/Green Deployments
- Build new schema fully before swapping to production
- Reduces risk if something goes wrong — Snowflake schema swap automation
Seed Files for Reference Data
- Replace long CASE statements with CSV seed files (e.g., country mappings)
Tags for Scheduling
- Tag models as
hourlyvs.nightlyfor differentiated refresh cadence (dbt run -m tag:hourly)
SQL Deduplication Patterns
Window function approach (Snowflake):
select *, row_number() over (
partition by user_id
order by created_at desc
) = 1 as is_most_recent_record
from users
qualify is_most_recent_record
Analytics Maturity Stages (from dbt’s Startup Founder’s Guide)
Applied to ConnectWise, which was implementing several stages simultaneously:
| Stage | Key Actions |
|---|---|
| Early | Data warehouse + ETL + BI tool selection |
| Mid | Version-controlled SQL modeling, event tracking, churn modeling |
| Growth | Data testing, PR/code reviews, documentation for knowledge scaling |
The AirBnB “Scaling Knowledge” reference is notable — documentation as a scaling strategy, not an afterthought.
dbt Viewpoint (Philosophy)
Three principles that guided the ConnectWise implementation:
- Analytics is collaborative — Version control, QA, documentation, modularity
- Analytic code is an asset — Environments, SLAs (“make analytics calm again”), design for maintainability
- Analytics workflows require automated tools
Reusable Patterns
- Testing pyramid for analytics — Dev/PR/Production mirrors software engineering testing strategy
- Four modeling activities taxonomy — Clear boundaries for what dbt should and should not do
- Three interfaces framework — Graph/Code/Warehouse as design dimensions
- Seven prescriptive actions — A ready-made team standards document
- Maturity stage mapping — Assess where a company is and what to implement next
Consulting Credibility
This document demonstrates deep operational knowledge of 01-projects/phdata/index-relevant skills: dbt implementation, team onboarding, testing strategy, and project structure design. The curated resource collection shows someone who was actively engaged with the dbt community and applying those patterns in production. Relevant to 01-projects/phdata/career-transition as evidence of hands-on analytics engineering leadership.