06-reference

dedp data contracts schema evolution

Fri Apr 03 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·book-chapter ·source: https://www.dedp.online/part-2/4-ce/data-contracts-schema-evolution-nosql.html ·by DEDP / Simon Späti

DEDP 4.3 — Data Contracts, Schema Evolution, NoSQL

Another convergent evolution chapter. Three approaches to the same survival problem: how do you change your data structures without breaking everything downstream? Schema evolution does it through migrations. NoSQL does it by avoiding rigid schemas. Data contracts do it through formal agreements between producers and consumers.

Schema Evolution (1970s+)

The systematic process of modifying database structure while preserving existing data.

Timeline:

Core principles:

Schema Registry is interesting — it functions as an early form of data contract, enforcing compatibility rules at the message level. This connects to the governance patterns in 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp.

NoSQL (1998+)

Handle data without explicitly defining schemas upfront. Schemas embedded in JSON documents — structures are fluid and dynamic.

Timeline:

Philosophy:

NoSQL’s schema-on-read approach is the opposite end of the spectrum from schema evolution’s schema-on-write. Both solve change management — one through discipline, the other through flexibility.

Data Contracts (2019+)

A formal agreement between data producers and consumers defining format, structure, semantics, validation rules, and metadata. Broader than schemas — they establish API-like interfaces between teams.

Timeline:

What makes contracts different from schemas:

This is the most organizationally relevant pattern. Schemas are technical. Contracts are sociotechnical — they encode agreements between teams, not just column definitions. For 01-projects/phdata/index clients, the conversation about data contracts is usually a conversation about organizational boundaries and ownership.

Comparative Analysis

AspectSchema EvolutionData ContractsNoSQL
Use caseRigidly defined structuresProducer-consumer agreementsDynamic, flexible schemas
GranularityTable-levelFine-grained with contract interfaceDocument-level
ImplementationMigrations and registriesDeclarative YAML/JSONEmbedded per-document schemas
ScopeData structure focusStructure + semantics + validationFlexible, runtime-determined

Four Shared Patterns

All three approaches implement:

  1. Change Management — handle modifications without system disruption
  2. Data Versioning — track evolution with rollback and time travel
  3. Data Lineage — maintain source-to-destination relationships
  4. Data Asset — decoupled producer-consumer relationships through stateful entities (connects to 06-reference/2026-04-04-dedp-data-asset-reusability-pattern)

Practical Implications

Modern data systems often use hybrid approaches:

The underlying challenge is constant: reliable data exchange while maintaining flexibility as organizations evolve. The tool changes; the pattern persists.

Connections