A cron job that loads data has no idea whether the job before it succeeded. A DAG does — that dependency is the whole point.
with DAG('daily_load', schedule='@daily') as dag:
extract = PythonOperator(task_id='extract', python_callable=extract_data)
transform = PythonOperator(task_id='transform', python_callable=transform_data)
load = PythonOperator(task_id='load', python_callable=load_data)
extract >> transform >> loadIf extract fails, transform and load never run — and you get an alert that names the actual failed step, not just "something broke."
What this buys you
- Retries on just the failed task, not the whole pipeline
- A visual graph of what depends on what
- Backfills that replay history correctly
None of this matters for a single nightly script. It matters a lot once you have thirty of them.