DEDP 4.2 — Bash vs. Stored Procedure vs. ETL Tool vs. Python Script

A convergent evolution chapter. Four orchestration paradigms evolved independently across four decades, each solving the same fundamental problem: move data from A to B, transform it, make it useful. The chapter traces how they converge on shared patterns despite radically different implementations.

The Four Paradigms

1. Bash Scripts + Cron (1979+)

The original. Shell scripts automate tasks; Cron provides time-based scheduling. Minimal overhead, portable across Unix systems, dead simple to start.

Cron emerged with Unix Version 7 (1979); Bash became standard in 1989
Unix philosophy: small tools piped together
Frictionless scheduling, zero dependencies
Still the right answer for simple, well-understood ETL tasks

2. Stored Procedures (1980s+)

Database-native code (PL/SQL, T-SQL) executing inside the database engine. No network latency. Leverages database optimization, parallelization, and transaction management.

Oracle PL/SQL (1995), SQL Server T-SQL
Co-location with data = performance
Risk: complexity spirals into “spaghetti code” with multiple procedure chains
Still common in enterprise environments; not going away

3. Traditional ETL Tools (1998+)

GUI-heavy applications: Informatica, SSIS, Oracle Warehouse Builder. Drag-and-drop interfaces for extraction, transformation, loading.

Data Transformation Services (1998) kicked it off
Human-friendly interfaces, consolidated batch processing
The ELT shift: cloud warehouses (BigQuery, Snowflake) make it cheaper to load first, transform after
01-projects/phdata/index clients still run legacy Informatica; migration path matters

4. Python Scripts + Frameworks (2015+)

From simple .py files to Airflow, Dagster, Kestra, Mage.ai. Modern frameworks add dependency modeling, execution history, triggers, and backfill.

Airflow (2015): Programmatic workflow authoring via Python DAGs
Dagster (2018): Data-aware orchestration — understands data assets, not just tasks
Kestra (2019): YAML-based declarative configuration
Mage.ai (2022): Notebook-style pipeline development
Described as “microservices on steroids” — Python flexibility + orchestration features

The Convergence

All four paradigms converge on the same underlying needs:

Data Engineering Lifecycle Support — each abstraction hides complexity while enabling ingestion, processing, analytics
Abstraction and Reusability — progressive encapsulation from sequential scripts to reusable components (connects to 06-reference/2026-04-04-dedp-data-asset-reusability-pattern)
Integration and Extensibility — from basic system integration to API-driven extensibility

The Paradigm Shifts

Three evolutionary arcs run through all four approaches:

From	To	Example
Sequential / procedural	Object-oriented / asset-based	Bash scripts → Dagster software-defined assets
Imperative (“do this, then this”)	Declarative (“I want this result”)	Stored procs → dbt YAML configs, Kestra
Database-native	Tool-agnostic / platform-independent	PL/SQL → Airflow → event-driven triggers

Communication evolved from in-code handling → gRPC-based inter-service communication. Orchestration evolved from tool-native scheduling → event-based triggers on data assets.

Four Extracted Patterns

Data-Flow Modeling — modular architecture where components focus on specific tasks while integrating seamlessly
Business Transformation — converting raw data into actionable insight through structured processes
Reusability — declarative focus on “what” not “how,” optimizing for data dependencies and asset lifecycles
Implicit Orchestration — event-driven architecture replacing centralized orchestrators with “webhooks, pub/sub systems, work queues, message buses”

What Matters for Consulting

The convergent evolution framing is gold for 01-projects/phdata/index conversations. When a client debates “should we use Airflow or Dagster?” the real question is: where are you on the imperative-to-declarative spectrum, and which paradigm shift are you ready for?

Most clients are somewhere in the stored-procedure-to-Python transition. The pattern vocabulary helps frame migrations as evolutionary steps rather than rip-and-replace projects.

The shift from task-based to asset-based orchestration (Dagster’s model) mirrors the 06-reference/2026-04-04-dedp-data-asset-reusability-pattern — data assets as first-class citizens, not side effects of task execution.

Connections

Reusability patterns: 06-reference/2026-04-04-dedp-data-asset-reusability-pattern
Cache as materialization: 06-reference/2026-04-04-dedp-cache-pattern
DWH evolution context: 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp
Dynamic queries as the consumer-side pattern: 06-reference/2026-04-04-dedp-dynamic-queries
Design pattern framing: 06-reference/2026-04-04-dedp-design-patterns-intro
Systems thinking: 06-reference/concepts/systems-over-goals