Skill

troubleshooting-astro-deployments

Troubleshoot Astronomer production deployments with Astro CLI. Use when investigating deployment issues, viewing production logs, analyzing failures, or managing deployment environment variables.

From data-engineering
Install
1
Run in your terminal
$
npx claudepluginhub astronomer/agents --plugin data
Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Astro Deployment Troubleshooting

This skill helps you diagnose and troubleshoot production Astronomer deployments using the Astro CLI.

For deployment management, see the managing-astro-deployments skill. For local development, see the managing-astro-local-env skill.


Quick Health Check

Start with these commands to get an overview:

# 1. List deployments to find target
astro deployment list

# 2. Get deployment overview
astro deployment inspect <DEPLOYMENT_ID>

# 3. Check for errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50

Viewing Deployment Logs

Use -c to control log count (default: 500). Log flags cannot be combined — use one component or level flag per command.

Component-Specific Logs

View logs from specific Airflow components:

# Scheduler logs (DAG processing, task scheduling)
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 50

# Worker logs (task execution)
astro deployment logs <DEPLOYMENT_ID> --workers -c 30

# Webserver logs (UI access, health checks)
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30

# Triggerer logs (deferrable operators)
astro deployment logs <DEPLOYMENT_ID> --triggerer -c 30

Log Level Filtering

Filter by severity:

# Error logs only (most useful for troubleshooting)
astro deployment logs <DEPLOYMENT_ID> --error -c 30

# Warning logs
astro deployment logs <DEPLOYMENT_ID> --warn -c 50

# Info-level logs
astro deployment logs <DEPLOYMENT_ID> --info -c 50

Search Logs

Search for specific keywords:

# Search for specific error
astro deployment logs <DEPLOYMENT_ID> --keyword "ConnectionError"

# Search for specific DAG
astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100

# Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"

# Find task failures
astro deployment logs <DEPLOYMENT_ID> --error --keyword "Task failed"

Complete Investigation Workflow

Step 1: Identify the Problem

# List deployments with status
astro deployment list

# Get deployment details
astro deployment inspect <DEPLOYMENT_ID>

Look for:

  • Status: HEALTHY vs UNHEALTHY
  • Runtime version compatibility
  • Resource limits (CPU, memory)
  • Recent deployment timestamp

Step 2: Check Error Logs

# Start with errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50

Look for:

  • Recurring error patterns
  • Specific DAGs failing repeatedly
  • Import errors or syntax errors
  • Connection or credential errors

Step 3: Review Scheduler Logs

# Check DAG processing
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 30

Look for:

  • DAG parse errors
  • Scheduling delays
  • Task queueing issues

Step 4: Check Worker Logs

# Check task execution
astro deployment logs <DEPLOYMENT_ID> --workers -c 30

Look for:

  • Task execution failures
  • Resource exhaustion
  • Timeout errors

Step 5: Verify Configuration

# Check environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>

Look for:

  • Missing or incorrect environment variables
  • Secrets configuration (AIRFLOW__SECRETS__BACKEND)
  • Connection configuration

Common Investigation Patterns

Recurring DAG Failures

Follow the complete investigation workflow above, then narrow to the specific DAG:

astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100

Resource Issues

# 1. Check deployment resource allocation
astro deployment inspect <DEPLOYMENT_ID>
# Look for: resource_quota_cpu, resource_quota_memory
# Worker queue: max_worker_count, worker_type

# 2. Check for worker scaling issues
astro deployment logs <DEPLOYMENT_ID> --workers -c 50

# 3. Look for out-of-memory errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "memory"

Configuration Problems

# 1. Review environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# 2. Check for secrets backend configuration
# Look for: AIRFLOW__SECRETS__BACKEND, AIRFLOW__SECRETS__BACKEND_KWARGS

# 3. Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>

# 4. Check webserver logs for auth issues
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30

Import Errors

# 1. Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"

# 2. Check scheduler for parse failures
astro deployment logs <DEPLOYMENT_ID> --scheduler --keyword "Failed to import" -c 50

# 3. Verify dependencies were deployed
astro deployment inspect <DEPLOYMENT_ID>
# Check: current_tag, last deployment timestamp

Environment Variables Management

List Variables

# List all variables for deployment
astro deployment variable list --deployment-id <DEPLOYMENT_ID>

# Find specific variable
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --key AWS_REGION

# Export variables to file
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --save --env .env.backup

Create Variables

# Create regular variable
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
  --key API_ENDPOINT \
  --value https://api.example.com

# Create secret (masked in UI and logs)
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
  --key API_KEY \
  --value secret123 \
  --secret

Update Variables

# Update existing variable
astro deployment variable update --deployment-id <DEPLOYMENT_ID> \
  --key API_KEY \
  --value newsecret

Delete Variables

# Delete variable
astro deployment variable delete --deployment-id <DEPLOYMENT_ID> --key OLD_KEY

Note: Variables are available to DAGs as environment variables. Changes require no redeployment.


Key Metrics from deployment inspect

Focus on these fields when troubleshooting:

  • status: HEALTHY vs UNHEALTHY
  • runtime_version: Airflow version compatibility
  • scheduler_size/scheduler_count: Scheduler capacity
  • executor: CELERY, KUBERNETES, or LOCAL
  • worker_queues: Worker scaling limits and types
    • min_worker_count, max_worker_count
    • worker_concurrency
    • worker_type (resource class)
  • resource_quota_cpu/memory: Overall resource limits
  • dag_deploy_enabled: Whether DAG-only deploys work
  • current_tag: Last deployment version
  • is_high_availability: Redundancy enabled

Investigation Best Practices

  1. Always start with error logs - Most obvious failures appear here
  2. Check error logs for patterns - Same DAG failing repeatedly? Timing patterns?
  3. Component-specific troubleshooting:
    • Worker logs → task execution details
    • Scheduler logs → DAG processing and scheduling
    • Webserver logs → UI issues and health checks
    • Triggerer logs → deferrable operator issues
  4. Use --keyword for targeted searches - More efficient than reading all logs
  5. The inspect command is your health dashboard - Check it first
  6. Environment variables in inspect output - May reveal configuration issues
  7. Log count default is 500 - Adjust with -c based on needs
  8. Don't forget to check deployment time - Recent deploy might have introduced issue

Troubleshooting Quick Reference

SymptomCommand
Deployment shows UNHEALTHYastro deployment inspect <ID> + --error logs
DAG not appearing--error logs for import errors, check --scheduler logs
Tasks failing--workers logs + search for DAG with --keyword
Slow scheduling--scheduler logs + check inspect for scheduler resources
UI not responding--webserver logs
Connection issuesCheck variables, search logs for connection name
Import errors--error --keyword "ImportError" + --scheduler logs
Out of memoryinspect for resources + --workers --keyword "memory"

Related Skills

  • managing-astro-deployments: Create, update, delete deployments, deploy code
  • managing-astro-local-env: Manage local Airflow development environment
  • setting-up-astro-project: Initialize and configure Astro projects
Stats
Stars290
Forks31
Last CommitFeb 18, 2026