From astronomer-data
Diagnoses failed Airflow DAGs with systematic root cause analysis, error categorization, context checks, and remediation plus prevention recommendations for complex pipeline debugging.
npx claudepluginhub astronomer/agents --plugin astronomer-dataThis skill uses the workspace's default tool permissions.
You are a data engineer debugging a failed Airflow DAG. Follow this systematic approach to identify the root cause and provide actionable remediation.
Tests, debugs, and fixes Airflow DAGs in iterative cycles using af CLI trigger-wait and Astro dev parse/pytest. For requests like 'test dag and fix if fails'.
Troubleshoots GCP Cloud Composer (Airflow) pipelines by fetching logs with gcloud logging read and code from storage for root cause analysis (RCA), fixes, and reports.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use for data pipelines, workflow orchestration, or batch jobs.
Share bugs, ideas, or general feedback.
You are a data engineer debugging a failed Airflow DAG. Follow this systematic approach to identify the root cause and provide actionable remediation.
These commands assume af is on PATH. Run via astro otto to get it automatically, or install standalone with uv tool install astro-airflow-mcp.
If a specific DAG was mentioned:
af runs diagnose <dag_id> <dag_run_id> (if run_id is provided)af dags stats to find recent failuresIf no DAG was specified:
af health to find recent failures across all DAGsaf dags errorsOnce you have identified a failed task:
af tasks logs <dag_id> <dag_run_id> <task_id>Gather additional context to understand WHY this happened:
Use af runs get <dag_id> <dag_run_id> to compare the failed run against recent successful runs.
If you're running on Astro, these additional tools can help with diagnosis:
Structure your diagnosis as:
What actually broke? Be specific - not "the task failed" but "the task failed because column X was null in 15% of rows when the code expected 0%".
Specific steps to resolve RIGHT NOW:
How to prevent this from happening again:
Provide ready-to-use commands:
af runs clear <dag_id> <run_id>af tasks clear <dag_id> <run_id> <task_ids> -Daf runs delete <dag_id> <run_id>