DEDP 4.3 — Data Contracts, Schema Evolution, NoSQL
Another convergent evolution chapter. Three approaches to the same survival problem: how do you change your data structures without breaking everything downstream? Schema evolution does it through migrations. NoSQL does it by avoiding rigid schemas. Data contracts do it through formal agreements between producers and consumers.
Schema Evolution (1970s+)
The systematic process of modifying database structure while preserving existing data.
Timeline:
- 1970s-1980s: Complete rebuilds required for schema changes (Oracle, System R)
- 1990s: DWH growth drove demand; Kimball’s dimensional modeling and Slowly Changing Dimensions emerged
- 2000s: ORM tools (Hibernate) introduced automated schema updates
- 2010s+: Liquibase (2006) and Kafka Schema Registry (2011) enabled “migrations as code”
Core principles:
- Backward and forward compatibility
- Additive-only changes to avoid breaking pipelines
- Versioned schemas with rollback capability
- Transaction logs enabling time travel
Schema Registry is interesting — it functions as an early form of data contract, enforcing compatibility rules at the message level. This connects to the governance patterns in 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp.
NoSQL (1998+)
Handle data without explicitly defining schemas upfront. Schemas embedded in JSON documents — structures are fluid and dynamic.
Timeline:
- 1998: Carlo Strozzi coins “NoSQL” for his lightweight relational DB
- 2006-2007: Google Bigtable and Amazon Dynamo papers published
- 2009: Modern movement gains momentum (MongoDB, Cassandra, Redis)
- 2010+: Graph databases (Neo4j)
Philosophy:
- Speed and availability over strict consistency
- BASE principle (Basically Available, Soft state, Eventually consistent) vs. ACID
- CAP theorem tradeoffs — prioritize Availability and Partition tolerance
- Horizontal scalability across machines
- “Not Only SQL” — not a rejection of SQL, but an expansion beyond it
NoSQL’s schema-on-read approach is the opposite end of the spectrum from schema evolution’s schema-on-write. Both solve change management — one through discipline, the other through flexibility.
Data Contracts (2019+)
A formal agreement between data producers and consumers defining format, structure, semantics, validation rules, and metadata. Broader than schemas — they establish API-like interfaces between teams.
Timeline:
- 2001: Protobuf developed at Google (early enabler)
- 2009: Apache Avro created for schema management
- 2019: Andrew Jones coins the term at GoCardless
- 2021-2022: Public popularization by Chad Sanderson and others
- 2023+: Movement gains real traction
What makes contracts different from schemas:
- Introduces a third-party interface (the contract itself) between producer and consumer
- Declaratively defined in YAML/JSON
- Enables automated testing, versioning, and notifications
- Enforces data quality through validation rules
- Establishes accountability — producers own the contract
This is the most organizationally relevant pattern. Schemas are technical. Contracts are sociotechnical — they encode agreements between teams, not just column definitions. For 01-projects/phdata/index clients, the conversation about data contracts is usually a conversation about organizational boundaries and ownership.
Comparative Analysis
| Aspect | Schema Evolution | Data Contracts | NoSQL |
|---|---|---|---|
| Use case | Rigidly defined structures | Producer-consumer agreements | Dynamic, flexible schemas |
| Granularity | Table-level | Fine-grained with contract interface | Document-level |
| Implementation | Migrations and registries | Declarative YAML/JSON | Embedded per-document schemas |
| Scope | Data structure focus | Structure + semantics + validation | Flexible, runtime-determined |
Four Shared Patterns
All three approaches implement:
- Change Management — handle modifications without system disruption
- Data Versioning — track evolution with rollback and time travel
- Data Lineage — maintain source-to-destination relationships
- Data Asset — decoupled producer-consumer relationships through stateful entities (connects to 06-reference/2026-04-04-dedp-data-asset-reusability-pattern)
Practical Implications
Modern data systems often use hybrid approaches:
- Schema registries provide contract-like functionality
- Open table formats (Delta Lake, Iceberg) embed contracts within tables
- Data Mesh architectures explicitly leverage data contracts as domain boundaries
- Protobuf, Avro, and Recap support multiple approaches simultaneously
The underlying challenge is constant: reliable data exchange while maintaining flexibility as organizations evolve. The tool changes; the pattern persists.
Connections
- DWH/MDM governance patterns: 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp
- Reusability through asset decoupling: 06-reference/2026-04-04-dedp-data-asset-reusability-pattern
- Semantic layer as contract enforcement: 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization
- ETL evolution context: 06-reference/2026-04-04-dedp-etl-tool-comparisons
- Design pattern framing: 06-reference/2026-04-04-dedp-design-patterns-intro
- Craft and quality framing: 06-reference/concepts/analytics-as-craft