06-reference

dedp etl tool comparisons

Fri Apr 03 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·book-chapter ·source: https://www.dedp.online/part-2/4-ce/bash-stored-procedure-etl-python-script.html ·by DEDP / Simon Späti

DEDP 4.2 — Bash vs. Stored Procedure vs. ETL Tool vs. Python Script

A convergent evolution chapter. Four orchestration paradigms evolved independently across four decades, each solving the same fundamental problem: move data from A to B, transform it, make it useful. The chapter traces how they converge on shared patterns despite radically different implementations.

The Four Paradigms

1. Bash Scripts + Cron (1979+)

The original. Shell scripts automate tasks; Cron provides time-based scheduling. Minimal overhead, portable across Unix systems, dead simple to start.

2. Stored Procedures (1980s+)

Database-native code (PL/SQL, T-SQL) executing inside the database engine. No network latency. Leverages database optimization, parallelization, and transaction management.

3. Traditional ETL Tools (1998+)

GUI-heavy applications: Informatica, SSIS, Oracle Warehouse Builder. Drag-and-drop interfaces for extraction, transformation, loading.

4. Python Scripts + Frameworks (2015+)

From simple .py files to Airflow, Dagster, Kestra, Mage.ai. Modern frameworks add dependency modeling, execution history, triggers, and backfill.

The Convergence

All four paradigms converge on the same underlying needs:

  1. Data Engineering Lifecycle Support — each abstraction hides complexity while enabling ingestion, processing, analytics
  2. Abstraction and Reusability — progressive encapsulation from sequential scripts to reusable components (connects to 06-reference/2026-04-04-dedp-data-asset-reusability-pattern)
  3. Integration and Extensibility — from basic system integration to API-driven extensibility

The Paradigm Shifts

Three evolutionary arcs run through all four approaches:

FromToExample
Sequential / proceduralObject-oriented / asset-basedBash scripts → Dagster software-defined assets
Imperative (“do this, then this”)Declarative (“I want this result”)Stored procs → dbt YAML configs, Kestra
Database-nativeTool-agnostic / platform-independentPL/SQL → Airflow → event-driven triggers

Communication evolved from in-code handling → gRPC-based inter-service communication. Orchestration evolved from tool-native scheduling → event-based triggers on data assets.

Four Extracted Patterns

  1. Data-Flow Modeling — modular architecture where components focus on specific tasks while integrating seamlessly
  2. Business Transformation — converting raw data into actionable insight through structured processes
  3. Reusability — declarative focus on “what” not “how,” optimizing for data dependencies and asset lifecycles
  4. Implicit Orchestration — event-driven architecture replacing centralized orchestrators with “webhooks, pub/sub systems, work queues, message buses”

What Matters for Consulting

The convergent evolution framing is gold for 01-projects/phdata/index conversations. When a client debates “should we use Airflow or Dagster?” the real question is: where are you on the imperative-to-declarative spectrum, and which paradigm shift are you ready for?

Most clients are somewhere in the stored-procedure-to-Python transition. The pattern vocabulary helps frame migrations as evolutionary steps rather than rip-and-replace projects.

The shift from task-based to asset-based orchestration (Dagster’s model) mirrors the 06-reference/2026-04-04-dedp-data-asset-reusability-pattern — data assets as first-class citizens, not side effects of task execution.

Connections