“Why I’m replacing Polars with DuckDB” — Daniel Beach (Apr 9 2026)
Why this is in the vault
Direct external validation of the tool we already chose. RDCO’s graph-db-eval prototype landed on DuckDB on 2026-04-14 (../01-projects/graph-db-eval/prototype-results.md). Five days earlier, a longtime Polars advocate publicly switched a production AWS Lambda workload off Polars onto DuckDB. The substantive lesson is not “Polars vs DuckDB benchmarks” — it’s about maintainer culture and breaking-change discipline as production-stability inputs, which is exactly the lens we’d use when betting infra on any embedded engine.
The core argument
Beach was an early Polars adopter (2022-2023), wrote a popular “replace Pandas with Polars” post, and put Polars in production “when others were still watching from the sidelines.” The break point: a Lambda-only logic update with pinned polars==1.31.0 on the AWS base Python 3.13 image silently broke the next morning. Compounding the technical surprise was an earlier GitHub interaction where a memory issue he hit was closed unkindly as “not our problem.”
His thesis (paraphrased): tools split into two camps — ones that obsess over developer ease, non-breaking changes, and maintainer kindness, and ones that don’t. He wants the first kind. He swapped Polars for DuckDB and “sleeps better already.”
Quoted phrasing he uses: “the proof is in the pudding”; the maintainer split is between projects that take “ownership to another level” and ones that don’t.
Mapping against Ray Data Co
Strong relevance — direct overlap with active tooling decision.
-
Reinforces the DuckDB pick for graph-db-eval. Our prototype chose DuckDB for local-first, embeddable, SQL-native graph traversal. Beach’s switch is unrelated to graph use, but it’s a second independent data point that DuckDB’s stability and maintainer posture hold up under production AWS Lambda load. Worth folding into ../01-projects/graph-db-eval/prototype-results.md as external validation when we move to phase 2.
-
Maintainer culture as a tool-selection criterion. This is a vault-worthy framework lift. Today our tool-selection rubric leans on technical fit (local-first, typed edges, query expressiveness). Beach’s experience says: add “maintainer responsiveness and breaking-change discipline” as an explicit row in the rubric. Especially load-bearing for embedded engines we’re going to wrap a skill around (like graph-ingest.py).
-
Cautionary pattern: pinning a version doesn’t save you when the surrounding image moves. Beach pinned
polars==1.31.0and still got bit because his Lambda base image’s Python toolchain shifted underneath. Translation for RDCO: anything we put on Mac Mini cron with pinned package versions is still exposed to OS-level Python upgrades. Worth a one-line note in ../04-tooling/ when we set up the autonomous loop’s Python deps. -
Style observation for newsletter remix. Beach uses first-person confession framing (“burn me at the stake,” “ripped Polars from its Lambda throne”) to make a tool-selection essay readable. That’s a Sanity Check-adjacent voice pattern — practitioner reckoning rather than vendor analysis. File mentally for the ../skills/draft-review.md voice corpus.
Sponsorship
None detected. No sponsor block, no self-consulting CTA in the body, no curation section. This is straight thought-leadership. Beach does link to his “other blog” (his old replace-Pandas-with-Polars post) — that’s self-citation, not sponsorship.
Caveats
- Sample size of one production failure. Beach explicitly acknowledges the dual blame on “Python environments” and “life in software,” not solely Polars.
- He doesn’t benchmark DuckDB vs Polars on his actual Lambda workload — the swap is driven by maintainer-culture and stability, not measured perf.
- Polars is a fast-moving Rust project; the GitHub interaction he describes is anecdotal, not a systemic claim.
Related
- ../01-projects/graph-db-eval/prototype-results.md — RDCO’s DuckDB prototype, completed 5 days after this article published
- ../01-projects/graph-db-eval/vertex-edge-dictionary.md — schema we built on DuckDB
- 2026-04-13-data-engineering-central-lambda-kappa.md — prior Daniel Beach Lambda-architecture piece
- 2026-04-15-data-engineering-central-robert-pack-basf-delta-lake.md — most recent DEC entry, same author/host
- ../decisions.md — candidate addition: “tool selection rubric must include maintainer-culture row”
Copyright note
All quoted phrasing is ≤15 words and attributed to Daniel Beach / Data Engineering Central. Full article at the source_url above; this note paraphrases for vault assessment only.