The Iceberg Ecosystem Today (Anders Swanson)
Podcast episode. Anders Swanson (dbt Labs developer experience) provides a practitioner-level overview of where the Iceberg ecosystem actually stands for production use.
Curated Topics
- Three integration phases: Phase 1 (naive Parquet read from object store), Phase 2 (REST catalog for versioned table access), Phase 3 (schema-scale discovery, multi-table transactions)
- Producer vs consumer model: Consumer-led requires downstream DDL pointers (operationally messy); producer-led is cleaner where the producer writes to catalog and it’s immediately queryable
- Metadata performance and resiliency: The devil is in information_schema listing speed; federated catalogs can be slow; Snowflake mirrors metadata for native-feeling performance
- Four-part namespace emergence: catalog.database.schema.identifier becoming standard across platforms (Spark, Databricks Unity, Snowflake catalog links)
- Vended credentials vs global identity: Vended credentials solve object-store access but don’t solve cross-platform grants; enterprises still configure permissions separately per platform
- Three things to watch: Push-based catalog updates (subscribe vs poll), small files problem improvements, platforms writing directly to external catalogs
RDCO-Relevant Links
- dbt Labs moving to all-Iceberg lake internally — signals market direction for client advisory
- Cross-platform mesh and dbt Mesh alignment — relevant to SDG pipeline architecture consulting