Skill

troubleshooting-astro-deployments

Troubleshoots Astronomer production deployments using Astro CLI: lists/inspects deployments, views component/error logs, searches failures.

Bash

devops

deployment

npx claudepluginhub astronomer/agents --plugin astronomer-data

Tool Access

This skill uses the workspace's default tool permissions.

Preview

This skill helps you diagnose and troubleshoot production Astronomer deployments using the Astro CLI.

SKILL.md

Similar Skills

managing-astro-deployments

361

Manages Astronomer production deployments with Astro CLI: authenticate, switch workspaces, list/inspect/create/update/delete deployments, deploy code.

astronomer-data

render-debug

Debugs failed Render deployments by analyzing logs, metrics, and database state. Identifies errors like missing env vars, port binding, OOM and suggests fixes for service crashes, health check failures, or performance issues.

6 files

render

gcp-composer-troubleshooting

Troubleshoots GCP Cloud Composer (Airflow) pipelines by fetching logs with gcloud logging read and code from storage for root cause analysis (RCA), fixes, and reports.

data-agent-kit-starter-pack

Stats

Stars361

Forks44

Last CommitFeb 18, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Astro Deployment Troubleshooting

This skill helps you diagnose and troubleshoot production Astronomer deployments using the Astro CLI.

For deployment management, see the managing-astro-deployments skill. For local development, see the managing-astro-local-env skill.

Quick Health Check

Start with these commands to get an overview:

# 1. List deployments to find target
astro deployment list

# 2. Get deployment overview
astro deployment inspect <DEPLOYMENT_ID>

# 3. Check for errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50

Viewing Deployment Logs

Use -c to control log count (default: 500). Log flags cannot be combined — use one component or level flag per command.

Component-Specific Logs

View logs from specific Airflow components:

# Scheduler logs (DAG processing, task scheduling)
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 50

# Worker logs (task execution)
astro deployment logs <DEPLOYMENT_ID> --workers -c 30

# Webserver logs (UI access, health checks)
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30

# Triggerer logs (deferrable operators)
astro deployment logs <DEPLOYMENT_ID> --triggerer -c 30

Log Level Filtering

Filter by severity:

# Error logs only (most useful for troubleshooting)
astro deployment logs <DEPLOYMENT_ID> --error -c 30

# Warning logs
astro deployment logs <DEPLOYMENT_ID> --warn -c 50

# Info-level logs
astro deployment logs <DEPLOYMENT_ID> --info -c 50

Search Logs

Search for specific keywords:

# Search for specific error
astro deployment logs <DEPLOYMENT_ID> --keyword "ConnectionError"

# Search for specific DAG
astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100

# Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"

# Find task failures
astro deployment logs <DEPLOYMENT_ID> --error --keyword "Task failed"

Complete Investigation Workflow

Step 1: Identify the Problem

# List deployments with status
astro deployment list

# Get deployment details
astro deployment inspect <DEPLOYMENT_ID>

Look for:

Status: HEALTHY vs UNHEALTHY
Runtime version compatibility
Resource limits (CPU, memory)
Recent deployment timestamp

Step 2: Check Error Logs

# Start with errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50

Look for:

Recurring error patterns
Specific DAGs failing repeatedly
Import errors or syntax errors
Connection or credential errors

Step 3: Review Scheduler Logs

# Check DAG processing
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 30

Look for:

DAG parse errors
Scheduling delays
Task queueing issues

Step 4: Check Worker Logs

# Check task execution
astro deployment logs <DEPLOYMENT_ID> --workers -c 30

Look for:

Task execution failures
Resource exhaustion
Timeout errors

Step 5: Verify Configuration

# Check environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>

Look for:

Missing or incorrect environment variables
Secrets configuration (AIRFLOW__SECRETS__BACKEND)
Connection configuration

Common Investigation Patterns

Recurring DAG Failures

Follow the complete investigation workflow above, then narrow to the specific DAG:

astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100

Resource Issues

# 1. Check deployment resource allocation
astro deployment inspect <DEPLOYMENT_ID>
# Look for: resource_quota_cpu, resource_quota_memory
# Worker queue: max_worker_count, worker_type

# 2. Check for worker scaling issues
astro deployment logs <DEPLOYMENT_ID> --workers -c 50

# 3. Look for out-of-memory errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "memory"

Configuration Problems

# 1. Review environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# 2. Check for secrets backend configuration
# Look for: AIRFLOW__SECRETS__BACKEND, AIRFLOW__SECRETS__BACKEND_KWARGS

# 3. Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>

# 4. Check webserver logs for auth issues
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30

Import Errors

# 1. Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"

# 2. Check scheduler for parse failures
astro deployment logs <DEPLOYMENT_ID> --scheduler --keyword "Failed to import" -c 50

# 3. Verify dependencies were deployed
astro deployment inspect <DEPLOYMENT_ID>
# Check: current_tag, last deployment timestamp

Environment Variables Management

List Variables

# List all variables for deployment
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# Find specific variable
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --key AWS_REGION

# Export variables to file
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --save --env .env.backup

Create Variables

# Create regular variable
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
  --key API_ENDPOINT \
  --value https://api.example.com

# Create secret (masked in UI and logs)
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
  --key API_KEY \
  --value secret123 \
  --secret

Update Variables

# Update existing variable
astro deployment variable update --deployment-id <DEPLOYMENT_ID> \
  --key API_KEY \
  --value newsecret

Delete Variables

# Delete variable
astro deployment variable delete --deployment-id <DEPLOYMENT_ID> --key OLD_KEY

Note: Variables are available to DAGs as environment variables. Changes require no redeployment.

Key Metrics from `deployment inspect`

Focus on these fields when troubleshooting:

status: HEALTHY vs UNHEALTHY
runtime_version: Airflow version compatibility
scheduler_size/scheduler_count: Scheduler capacity
executor: CELERY, KUBERNETES, or LOCAL
worker_queues: Worker scaling limits and types
- min_worker_count, max_worker_count
- worker_concurrency
- worker_type (resource class)
resource_quota_cpu/memory: Overall resource limits
dag_deploy_enabled: Whether DAG-only deploys work
current_tag: Last deployment version
is_high_availability: Redundancy enabled

Investigation Best Practices

Always start with error logs - Most obvious failures appear here
Check error logs for patterns - Same DAG failing repeatedly? Timing patterns?
Component-specific troubleshooting:
- Worker logs → task execution details
- Scheduler logs → DAG processing and scheduling
- Webserver logs → UI issues and health checks
- Triggerer logs → deferrable operator issues
Use --keyword for targeted searches - More efficient than reading all logs
The inspect command is your health dashboard - Check it first
Environment variables in inspect output - May reveal configuration issues
Log count default is 500 - Adjust with -c based on needs
Don't forget to check deployment time - Recent deploy might have introduced issue

Troubleshooting Quick Reference

Symptom	Command
Deployment shows UNHEALTHY	`astro deployment inspect <ID>` + `--error` logs
DAG not appearing	`--error` logs for import errors, check `--scheduler` logs
Tasks failing	`--workers` logs + search for DAG with `--keyword`
Slow scheduling	`--scheduler` logs + check `inspect` for scheduler resources
UI not responding	`--webserver` logs
Connection issues	Check variables, search logs for connection name
Import errors	`--error --keyword "ImportError"` + `--scheduler` logs
Out of memory	`inspect` for resources + `--workers --keyword "memory"`

Related Skills

managing-astro-deployments: Create, update, delete deployments, deploy code
managing-astro-local-env: Manage local Airflow development environment
setting-up-astro-project: Initialize and configure Astro projects

troubleshooting-astro-deployments

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

troubleshooting-astro-deployments

Tool Access

Preview

SKILL.md

Astro Deployment Troubleshooting

Quick Health Check

Viewing Deployment Logs

Component-Specific Logs

Log Level Filtering

Search Logs

Complete Investigation Workflow

Step 1: Identify the Problem

Step 2: Check Error Logs

Step 3: Review Scheduler Logs

Step 4: Check Worker Logs

Step 5: Verify Configuration

Common Investigation Patterns

Recurring DAG Failures

Resource Issues

Configuration Problems

Import Errors

Environment Variables Management

List Variables

Create Variables

Update Variables

Delete Variables

Key Metrics from deployment inspect

Investigation Best Practices

Troubleshooting Quick Reference

Related Skills

Similar Skills

Help us improve

Astro Deployment Troubleshooting

Quick Health Check

Viewing Deployment Logs

Component-Specific Logs

Log Level Filtering

Search Logs

Complete Investigation Workflow

Step 1: Identify the Problem

Step 2: Check Error Logs

Step 3: Review Scheduler Logs

Step 4: Check Worker Logs

Step 5: Verify Configuration

Common Investigation Patterns

Recurring DAG Failures

Resource Issues

Configuration Problems

Import Errors

Environment Variables Management

List Variables

Create Variables

Update Variables

Delete Variables

Key Metrics from deployment inspect

Investigation Best Practices

Troubleshooting Quick Reference

Related Skills

Key Metrics from `deployment inspect`

Key Metrics from `deployment inspect`