DEDP 4.3 — Data Contracts, Schema Evolution, NoSQL

Another convergent evolution chapter. Three approaches to the same survival problem: how do you change your data structures without breaking everything downstream? Schema evolution does it through migrations. NoSQL does it by avoiding rigid schemas. Data contracts do it through formal agreements between producers and consumers.

Schema Evolution (1970s+)

The systematic process of modifying database structure while preserving existing data.

Timeline:

1970s-1980s: Complete rebuilds required for schema changes (Oracle, System R)
1990s: DWH growth drove demand; Kimball's dimensional modeling and Slowly Changing Dimensions emerged
2000s: ORM tools (Hibernate) introduced automated schema updates
2010s+: Liquibase (2006) and Kafka Schema Registry (2011) enabled "migrations as code"

Core principles:

Backward and forward compatibility
Additive-only changes to avoid breaking pipelines
Versioned schemas with rollback capability
Transaction logs enabling time travel

Schema Registry is interesting — it functions as an early form of data contract, enforcing compatibility rules at the message level. This connects to the governance patterns in [[06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp]].

NoSQL (1998+)

Handle data without explicitly defining schemas upfront. Schemas embedded in JSON documents — structures are fluid and dynamic.

Timeline:

1998: Carlo Strozzi coins "NoSQL" for his lightweight relational DB
2006-2007: Google Bigtable and Amazon Dynamo papers published
2009: Modern movement gains momentum (MongoDB, Cassandra, Redis)
2010+: Graph databases (Neo4j)

Philosophy:

Speed and availability over strict consistency
BASE principle (Basically Available, Soft state, Eventually consistent) vs. ACID
CAP theorem tradeoffs — prioritize Availability and Partition tolerance
Horizontal scalability across machines
"Not Only SQL" — not a rejection of SQL, but an expansion beyond it

NoSQL's schema-on-read approach is the opposite end of the spectrum from schema evolution's schema-on-write. Both solve change management — one through discipline, the other through flexibility.

Data Contracts (2019+)

A formal agreement between data producers and consumers defining format, structure, semantics, validation rules, and metadata. Broader than schemas — they establish API-like interfaces between teams.

Timeline:

2001: Protobuf developed at Google (early enabler)
2009: Apache Avro created for schema management
2019: Andrew Jones coins the term at GoCardless
2021-2022: Public popularization by Chad Sanderson and others
2023+: Movement gains real traction

What makes contracts different from schemas:

Introduces a third-party interface (the contract itself) between producer and consumer
Declaratively defined in YAML/JSON
Enables automated testing, versioning, and notifications
Enforces data quality through validation rules
Establishes accountability — producers own the contract

This is the most organizationally relevant pattern. Schemas are technical. Contracts are sociotechnical — they encode agreements between teams, not just column definitions. For [[01-projects/phdata/index]] clients, the conversation about data contracts is usually a conversation about organizational boundaries and ownership.

Comparative Analysis

Aspect	Schema Evolution	Data Contracts	NoSQL
Use case	Rigidly defined structures	Producer-consumer agreements	Dynamic, flexible schemas
Granularity	Table-level	Fine-grained with contract interface	Document-level
Implementation	Migrations and registries	Declarative YAML/JSON	Embedded per-document schemas
Scope	Data structure focus	Structure + semantics + validation	Flexible, runtime-determined

Four Shared Patterns

All three approaches implement:

Change Management — handle modifications without system disruption
Data Versioning — track evolution with rollback and time travel
Data Lineage — maintain source-to-destination relationships
Data Asset — decoupled producer-consumer relationships through stateful entities (connects to [[06-reference/2026-04-04-dedp-data-asset-reusability-pattern]])

Practical Implications

Modern data systems often use hybrid approaches:

Schema registries provide contract-like functionality
Open table formats (Delta Lake, Iceberg) embed contracts within tables
Data Mesh architectures explicitly leverage data contracts as domain boundaries
Protobuf, Avro, and Recap support multiple approaches simultaneously

The underlying challenge is constant: reliable data exchange while maintaining flexibility as organizations evolve. The tool changes; the pattern persists.

Connections

DWH/MDM governance patterns: [[06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp]]
Reusability through asset decoupling: [[06-reference/2026-04-04-dedp-data-asset-reusability-pattern]]
Semantic layer as contract enforcement: [[06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization]]
ETL evolution context: [[06-reference/2026-04-04-dedp-etl-tool-comparisons]]
Design pattern framing: [[06-reference/2026-04-04-dedp-design-patterns-intro]]
Craft and quality framing: [[06-reference/concepts/analytics-as-craft]]