Uber’s Journey Toward Better Data Culture From First Principles
Summary
Uber’s engineering team lays out the five principles they adopted to fix data culture at scale: data as code, data is owned, data quality is known, accelerate data productivity, and data education. Even with world-class talent and industry-leading tooling, they hit the same problems every data org hits — duplication, poor discovery, disconnected tools, inconsistent processes, and missing ownership/SLAs.
The core mental model: treat data artifacts with the same rigor you treat service APIs. Schema changes get mandatory reviewers. Datasets have owners, SLAs for quality, and incident management. Documentation and testing are non-negotiable. This is not a tools problem — it is a culture and process problem that tools can support but never solve on their own.
The five failure modes they identified are a near-universal diagnostic checklist:
- Data duplication — no source-of-truth for critical metrics, leading to confusion at consumption time
- Discovery issues — hundreds of thousands of datasets with no rich metadata or faceted search
- Disconnected tools — copy-pasting documentation across systems, no downstream visibility for schema changes
- Lack of process — inconsistent maturity levels across teams
- Lack of ownership and SLAs — no quality guarantees, no on-call for data
This diagnostic checklist is directly usable in 01-projects/phdata/index consulting engagements. When a client says “our data is a mess,” these five categories give you a structured intake framework. The “data as code” principle also maps cleanly to 06-reference/2026-04-03-data-maturity-processes-tools — it is the mindset shift that separates Stage 1 from Stage 2 maturity.
The ownership and SLA framing connects to 06-reference/2026-04-03-data-products-taxonomy — if data is a product, it needs a product owner and a service contract. This is also the kind of organizational design work discussed in 06-reference/2026-03-31-block-hierarchy-to-intelligence, where the hierarchy exists to route accountability, not just information.
For 01-projects/data-marketplace/index, the “data quality is known” principle is table stakes for any data-as-a-service offering. Consumers will not pay for data they cannot trust, and trust requires visible SLAs.
The Snowflake-specific angle (06-reference/2026-04-03-snowflake-rapid-growth-doordash, 06-reference/2026-04-03-netlify-databricks-to-snowflake): Uber’s problems are platform-agnostic, but the solution patterns (mandatory reviewers, staging environments, monitoring integration) map well to Snowflake’s access controls, clone-based dev/staging, and partner-connect monitoring tools.
Open Questions
- What is the best guidance for establishing data SLAs from scratch? Is there a lightweight starting point for teams that have never done this?
- How do you enforce “data as code” in organizations where the data team does not own the schema — e.g., the application engineering team controls the source tables?
- Does the ownership model break down for shared/cross-functional datasets, or does it just require more sophisticated RACI?