This skill should be used when implementing error handling in Ansible, using block/rescue/always patterns, creating retry logic with until/retries, handling expected failures gracefully, or providing clear error messages with assert and fail.
Implements robust error handling in Ansible using block/rescue/always patterns, retry logic with until/retries, and graceful failure handling with failed_when. Use when you need to handle task failures, validate inputs with assert, or provide recovery mechanisms in playbooks.
/plugin marketplace add basher83/lunar-claude/plugin install ansible-workflows@lunar-claudeThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/error-handling.mdPatterns for robust error handling in Ansible playbooks and roles.
Handle errors and perform cleanup:
- name: Deploy application
block:
- name: Stop application
ansible.builtin.systemd:
name: myapp
state: stopped
- name: Deploy new version
ansible.builtin.copy:
src: myapp-v2.0
dest: /usr/bin/myapp
- name: Start application
ansible.builtin.systemd:
name: myapp
state: started
rescue:
- name: Rollback to previous version
ansible.builtin.copy:
src: myapp-backup
dest: /usr/bin/myapp
- name: Start application (rollback)
ansible.builtin.systemd:
name: myapp
state: started
- name: Report failure
ansible.builtin.fail:
msg: "Deployment failed, rolled back to previous version"
always:
- name: Cleanup temp files
ansible.builtin.file:
path: /tmp/deploy-*
state: absent
Handle transient failures with retries:
- name: Wait for service to be ready
ansible.builtin.uri:
url: http://localhost:8080/health
status_code: 200
register: health_check
until: health_check.status == 200
retries: 30
delay: 10
# Total wait: up to 5 minutes (30 * 10s)
- name: Wait for cluster to stabilize
ansible.builtin.command: pvecm status
register: cluster_status
until: "'Quorate: Yes' in cluster_status.stdout"
retries: 12
delay: 5
changed_when: false
| Parameter | Description |
|---|---|
until | Condition that must be true to stop retrying |
retries | Maximum number of attempts |
delay | Seconds between attempts |
Validate inputs with clear error messages:
- name: Validate required variables
ansible.builtin.assert:
that:
- vm_name is defined
- vm_name | length > 0
- vm_memory >= 1024
- vm_cores >= 1
fail_msg: |
Invalid VM configuration:
- vm_name: {{ vm_name | default('NOT SET') }}
- vm_memory: {{ vm_memory | default('NOT SET') }} (min: 1024)
- vm_cores: {{ vm_cores | default('NOT SET') }} (min: 1)
success_msg: "VM configuration validated"
quiet: true
# Variable defined and non-empty
- vm_name is defined and vm_name | trim | length > 0
# Numeric range
- vm_memory >= 1024 and vm_memory <= 65536
# Regex match
- vm_name is match('^[a-z0-9-]+$')
# List has items
- vm_networks | length > 0
# Value in allowed list
- vm_ostype in ['l26', 'win10', 'win11']
Provide actionable error messages:
- name: Check prerequisites
ansible.builtin.command: which docker
register: docker_check
changed_when: false
failed_when: false
- name: Fail if Docker not installed
ansible.builtin.fail:
msg: |
Docker is not installed on {{ inventory_hostname }}.
To install Docker:
sudo apt update
sudo apt install docker.io
Or use the docker role:
ansible-playbook playbooks/install-docker.yml
when: docker_check.rc != 0
Allow expected "failures":
- name: Try to stop service
ansible.builtin.systemd:
name: myservice
state: stopped
register: stop_result
failed_when:
- stop_result.failed
- "'not found' not in stop_result.msg"
# Only fail if error is NOT "service not found"
- name: Join cluster
ansible.builtin.command: pvecm add {{ primary_node }}
register: cluster_join
failed_when:
- cluster_join.rc != 0
- "'already in a cluster' not in cluster_join.stderr"
- "'cannot join' not in cluster_join.stderr"
changed_when: cluster_join.rc == 0
Separate checking from failing for better control:
- name: Check if resource exists
ansible.builtin.command: check-resource {{ resource_id }}
register: resource_check
changed_when: false
failed_when: false # Don't fail here
- name: Fail with context if missing
ansible.builtin.fail:
msg: |
Resource {{ resource_id }} not found.
Command output: {{ resource_check.stderr }}
Hint: Ensure resource was created first.
when: resource_check.rc != 0
Attempt operation, handle specific errors:
- name: Attempt primary approach
block:
- name: Connect via primary endpoint
ansible.builtin.uri:
url: "https://{{ primary_host }}:8006/api2/json"
validate_certs: true
register: primary_result
rescue:
- name: Log primary failure
ansible.builtin.debug:
msg: "Primary endpoint failed: {{ primary_result.msg | default('unknown error') }}"
- name: Try fallback endpoint
ansible.builtin.uri:
url: "https://{{ fallback_host }}:8006/api2/json"
validate_certs: false
register: fallback_result
Run checks from controller for better error context:
- name: Verify API endpoint from controller
ansible.builtin.uri:
url: "https://{{ inventory_hostname }}:8006/api2/json/version"
validate_certs: false
delegate_to: localhost
register: api_check
failed_when: false
- name: Report API status
ansible.builtin.fail:
msg: |
Cannot reach Proxmox API on {{ inventory_hostname }}
Status: {{ api_check.status | default('connection failed') }}
Check: Network connectivity, firewall rules, pveproxy service
when: api_check.status | default(0) != 200
- name: Remove optional backup
ansible.builtin.file:
path: /backup/old-backup.tar.gz
state: absent
ignore_errors: true
register: cleanup_result
- name: Report cleanup status
ansible.builtin.debug:
msg: "Cleanup {{ 'successful' if not cleanup_result.failed else 'skipped' }}"
# BETTER than ignore_errors
- name: Remove backup
ansible.builtin.file:
path: /backup/old-backup.tar.gz
state: absent
register: cleanup_result
failed_when:
- cleanup_result.failed
- "'does not exist' not in cleanup_result.msg | default('')"
---
- name: Deploy with comprehensive error handling
hosts: app_servers
become: true
tasks:
- name: Validate configuration
ansible.builtin.assert:
that:
- app_version is defined
- app_version is match('^\d+\.\d+\.\d+$')
fail_msg: "Invalid app_version: {{ app_version | default('NOT SET') }}"
- name: Deploy application
block:
- name: Download release
ansible.builtin.get_url:
url: "https://releases.example.com/{{ app_version }}.tar.gz"
dest: /tmp/app.tar.gz
register: download
until: download is succeeded
retries: 3
delay: 5
- name: Stop current version
ansible.builtin.systemd:
name: myapp
state: stopped
- name: Extract release
ansible.builtin.unarchive:
src: /tmp/app.tar.gz
dest: /opt/myapp
remote_src: true
- name: Start new version
ansible.builtin.systemd:
name: myapp
state: started
- name: Verify health
ansible.builtin.uri:
url: http://localhost:8080/health
register: health
until: health.status == 200
retries: 6
delay: 10
rescue:
- name: Restore previous version
ansible.builtin.copy:
src: /opt/myapp-backup/
dest: /opt/myapp/
remote_src: true
- name: Start previous version
ansible.builtin.systemd:
name: myapp
state: started
- name: Report deployment failure
ansible.builtin.fail:
msg: |
Deployment of {{ app_version }} failed.
Previous version restored.
Check logs: journalctl -u myapp
always:
- name: Cleanup download
ansible.builtin.file:
path: /tmp/app.tar.gz
state: absent
For detailed error handling patterns and techniques, consult:
references/error-handling.md - Comprehensive error handling patterns, block/rescue/always examples, retry strategiesThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.