The Iceberg Ecosystem Today (Anders Swanson)

Podcast episode. Anders Swanson (dbt Labs developer experience) provides a practitioner-level overview of where the Iceberg ecosystem actually stands for production use.

Curated Topics

Three integration phases: Phase 1 (naive Parquet read from object store), Phase 2 (REST catalog for versioned table access), Phase 3 (schema-scale discovery, multi-table transactions)
Producer vs consumer model: Consumer-led requires downstream DDL pointers (operationally messy); producer-led is cleaner where the producer writes to catalog and it’s immediately queryable
Metadata performance and resiliency: The devil is in information_schema listing speed; federated catalogs can be slow; Snowflake mirrors metadata for native-feeling performance
Four-part namespace emergence: catalog.database.schema.identifier becoming standard across platforms (Spark, Databricks Unity, Snowflake catalog links)
Vended credentials vs global identity: Vended credentials solve object-store access but don’t solve cross-platform grants; enterprises still configure permissions separately per platform
Three things to watch: Push-based catalog updates (subscribe vs poll), small files problem improvements, platforms writing directly to external catalogs

RDCO-Relevant Links

dbt Labs moving to all-Iceberg lake internally — signals market direction for client advisory
Cross-platform mesh and dbt Mesh alignment — relevant to SDG pipeline architecture consulting