06-reference

seattle data guy data pipeline patterns

Sun Jan 04 2026 19:00:00 GMT-0500 (Eastern Standard Time) ·reference ·source: SeattleDataGuy's Newsletter (Substack) ·by SeattleDataGuy (Ben Rogojan)

“Common Data Pipeline Patterns You’ll See in the Real World” — @SeattleDataGuy

Why this is in the vault

Part of the SDG backfill. First article in the SDG 2026 pipelines series. Useful as a taxonomic foundation note — future SDG pipeline articles in this series will reference these five categories, so having the shared vocabulary filed makes later pieces cheaper to integrate.

The core argument

When data teams say “data pipeline,” they actually mean any of several structurally different things. SDG groups them into five real-world patterns that show up across industries, and names a few more quickly at the end.

The five patterns

  1. Source Standardization Pipelines — ingest from multiple partners in different formats (CSV, XML, positional files, APIs), map into a shared core model. Mapping is the hard part: standardize gender codes, category labels, date formats, time zones. Output powers marketplaces, industry-level reports, cross-partner analytics.
  2. Amalgamation Pipelines — merge multiple sources into a single flow or 360-view. Sales funnel stitched across HubSpot + Google Ads + Salesforce + Stripe. Hard part is the reliable join ID and late-landing data handling.
  3. Excel “Data Pipelines” — semi-automated VBA/VLOOKUP-driven extract-transform-load. SDG argues these functionally solve the same problem even if they’re not “real” pipelines. They tend to get productionized later.
  4. Enrichment Pipelines — separate pipeline adding columns to core tables: lead scores, ML-derived features, external data joins. Built after the core model is stable.
  5. Operational Pipelines (reverse ETL) — push data back into operational systems (Salesforce segmentation, NetSuite updates, HubSpot lists). Hard because target systems often require single-record updates and have idiosyncratic APIs; straddles software + data boundary.

Honorable mentions: ML pipelines, integration pipelines, migration pipelines, metadata/lineage pipelines.

Mapping against Ray Data Co

This is a taxonomy article, not an operational one. Value for us is vocabulary-sharing across future SDG pieces in the series. Light RDCO mapping:

SDG’s “Articles Worth Reading” pointed at two pieces:

Sponsorships / bias notes