Overcoming Rapid Growth Challenges for Datasets in Snowflake — DoorDash

Summary

DoorDash outlines a prioritized optimization checklist for Snowflake ETL pipelines under rapid dataset growth. The key mental model is least-effort-first optimization: start by asking if the pipeline can be eliminated entirely, then reduce DAG dependencies, then convert to incremental loads, then shrink column counts, then fix data spillage, then add clustering, and finally leverage Snowflake-native functions.

The ordering matters. Most teams jump to clustering or warehouse sizing when the highest-ROI move is often decommissioning unused pipelines or breaking unnecessary DAG edges. The hierarchy mirrors a general principle: subtraction before optimization.

Relevance

Directly useful for 01-projects/phdata/index Snowflake consulting engagements — this is a ready-made playbook for cost optimization conversations. Also good 01-projects/phdata/career-transition interview material: “walk me through how you’d optimize a Snowflake environment under cost pressure.”

The least-effort-first ordering connects to 06-reference/2026-03-31-block-hierarchy-to-intelligence — capability layers need a similar triage. Fix the foundation (do we even need this?) before tuning the top.

See also 06-reference/2026-04-03-data-maturity-processes-tools — optimization maturity follows the same pattern as data maturity: eliminate waste before adding sophistication.

Open Questions

Can we templatize this as a “Snowflake Health Check” offering for phData clients?
What percentage of warehouse spend at a typical client is decommission-able pipelines? (anecdotally high)
How does Snowflake’s dynamic tables change the incremental-vs-full calculus?