DEDP 4.2 — Bash vs. Stored Procedure vs. ETL Tool vs. Python Script
A convergent evolution chapter. Four orchestration paradigms evolved independently across four decades, each solving the same fundamental problem: move data from A to B, transform it, make it useful. The chapter traces how they converge on shared patterns despite radically different implementations.
The Four Paradigms
1. Bash Scripts + Cron (1979+)
The original. Shell scripts automate tasks; Cron provides time-based scheduling. Minimal overhead, portable across Unix systems, dead simple to start.
- Cron emerged with Unix Version 7 (1979); Bash became standard in 1989
- Unix philosophy: small tools piped together
- Frictionless scheduling, zero dependencies
- Still the right answer for simple, well-understood ETL tasks
2. Stored Procedures (1980s+)
Database-native code (PL/SQL, T-SQL) executing inside the database engine. No network latency. Leverages database optimization, parallelization, and transaction management.
- Oracle PL/SQL (1995), SQL Server T-SQL
- Co-location with data = performance
- Risk: complexity spirals into “spaghetti code” with multiple procedure chains
- Still common in enterprise environments; not going away
3. Traditional ETL Tools (1998+)
GUI-heavy applications: Informatica, SSIS, Oracle Warehouse Builder. Drag-and-drop interfaces for extraction, transformation, loading.
- Data Transformation Services (1998) kicked it off
- Human-friendly interfaces, consolidated batch processing
- The ELT shift: cloud warehouses (BigQuery, Snowflake) make it cheaper to load first, transform after
- 01-projects/phdata/index clients still run legacy Informatica; migration path matters
4. Python Scripts + Frameworks (2015+)
From simple .py files to Airflow, Dagster, Kestra, Mage.ai. Modern frameworks add dependency modeling, execution history, triggers, and backfill.
- Airflow (2015): Programmatic workflow authoring via Python DAGs
- Dagster (2018): Data-aware orchestration — understands data assets, not just tasks
- Kestra (2019): YAML-based declarative configuration
- Mage.ai (2022): Notebook-style pipeline development
- Described as “microservices on steroids” — Python flexibility + orchestration features
The Convergence
All four paradigms converge on the same underlying needs:
- Data Engineering Lifecycle Support — each abstraction hides complexity while enabling ingestion, processing, analytics
- Abstraction and Reusability — progressive encapsulation from sequential scripts to reusable components (connects to 06-reference/2026-04-04-dedp-data-asset-reusability-pattern)
- Integration and Extensibility — from basic system integration to API-driven extensibility
The Paradigm Shifts
Three evolutionary arcs run through all four approaches:
| From | To | Example |
|---|---|---|
| Sequential / procedural | Object-oriented / asset-based | Bash scripts → Dagster software-defined assets |
| Imperative (“do this, then this”) | Declarative (“I want this result”) | Stored procs → dbt YAML configs, Kestra |
| Database-native | Tool-agnostic / platform-independent | PL/SQL → Airflow → event-driven triggers |
Communication evolved from in-code handling → gRPC-based inter-service communication. Orchestration evolved from tool-native scheduling → event-based triggers on data assets.
Four Extracted Patterns
- Data-Flow Modeling — modular architecture where components focus on specific tasks while integrating seamlessly
- Business Transformation — converting raw data into actionable insight through structured processes
- Reusability — declarative focus on “what” not “how,” optimizing for data dependencies and asset lifecycles
- Implicit Orchestration — event-driven architecture replacing centralized orchestrators with “webhooks, pub/sub systems, work queues, message buses”
What Matters for Consulting
The convergent evolution framing is gold for 01-projects/phdata/index conversations. When a client debates “should we use Airflow or Dagster?” the real question is: where are you on the imperative-to-declarative spectrum, and which paradigm shift are you ready for?
Most clients are somewhere in the stored-procedure-to-Python transition. The pattern vocabulary helps frame migrations as evolutionary steps rather than rip-and-replace projects.
The shift from task-based to asset-based orchestration (Dagster’s model) mirrors the 06-reference/2026-04-04-dedp-data-asset-reusability-pattern — data assets as first-class citizens, not side effects of task execution.
Connections
- Reusability patterns: 06-reference/2026-04-04-dedp-data-asset-reusability-pattern
- Cache as materialization: 06-reference/2026-04-04-dedp-cache-pattern
- DWH evolution context: 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp
- Dynamic queries as the consumer-side pattern: 06-reference/2026-04-04-dedp-dynamic-queries
- Design pattern framing: 06-reference/2026-04-04-dedp-design-patterns-intro
- Systems thinking: 06-reference/concepts/systems-over-goals