Apache Iceberg and the Catalog Layer (w/ Russell Spitzer)
Podcast episode. Russell Spitzer (Iceberg/Polaris PMC, principal engineer at Snowflake) goes deep on open table format evolution and the catalog layer.
Curated Topics
- Iceberg version history: v1 (ACID transactions), v2 (row-level deletes for GDPR), v3 (geospatial + Variant type standardized across vendors), v4 (streaming commit latency + AI use cases)
- Polaris catalog: Apache incubator project implementing the Iceberg REST catalog spec; aims to be a broad, interoperable lakehouse catalog with pluggable identity providers
- Apache governance model: PMC-driven, consensus-based; no single company controls roadmap or licensing; contributors earn influence through community work
- Migration pragmatics: Iceberg replaces bespoke Hive/HDFS compaction and locking toil; smaller companies should offload runtime to SaaS
- Identity and vended credentials: Catalog vends short-lived credentials for object store access, solving the “two keys” problem but not global authorization
- Decoupling compute and storage: Defaults tuned for HDFS don’t hold for S3; elastic scaling requires rethinking file sizes and commit patterns
RDCO-Relevant Links
- Open table format adoption trajectory — relevant for client advisory on lakehouse architecture
- Iceberg v4 streaming + AI direction — connects to SDG pipeline modernization discussions