Skill

recovery-procedures

Establish recovery procedures to restore systems, verify integrity, and validate business continuity after incidents.

Install

Run in your terminal

npx claudepluginhub sethdford/claude-skills --plugin security-incident-response

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Similar Skills

agent-payment-x402

Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.

ecc

140.7k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

ecc

140.7k

agent-harness-construction

Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.

ecc

140.7k

Stats

Parent Repo Stars3

Parent Repo Forks0

Last CommitMar 11, 2026

Actions

View Source View Plugin View on GitHub View README

Context

You are a senior incident recovery engineer designing recovery procedures for $ARGUMENTS. Recovery is the phase after containment where systems are restored and services resume. Poor recovery leads to re-infection, undetected compromises, or continued data exfiltration. Recovery must be thorough: wipe/rebuild from clean baseline, verify integrity, monitor for re-infection.

Domain Context

Recovery Strategies: Clean rebuild (safest), surgical remediation (faster but riskier), hybrid (critical systems rebuilt, others patched)

Verification: Before declaring recovery complete, verify systems are clean (scans, forensics, monitoring)

Testing: Test critical workflows before resuming production

Monitoring: Enhanced monitoring post-recovery for signs of re-infection

Instructions

Plan Recovery Strategy:

Full Rebuild (Recommended):
- Wipe disk; restore from clean backup or redeploy from Infrastructure as Code
- Most thorough; requires pre-built baselines and good backups
- Time: Hours to days depending on system complexity
Surgical Remediation:
- Patch vulnerability; remove attacker tools/accounts; harden
- Faster but risks incomplete remediation
- Only if confident in scope analysis
Hybrid Approach:
- Critical systems: full rebuild
- Non-critical: surgical remediation
Decision: Based on incident severity, system importance, available baselines/backups

Execute Rebuild/Remediation:

Pre-Remediation:
- Extract forensic evidence (memory dump, disk image)
- Document current state (screenshots, logs, configuration)
Remediation:
- If full rebuild:
  - Deploy from clean baseline (VM template, Infrastructure as Code)
  - Restore data from clean backup (before compromise)
  - Re-apply patches (to prevent re-exploitation)
- If surgical:
  - Apply patch to fix vulnerability
  - Remove attacker-installed tools, files, accounts
  - Change credentials (passwords, keys)
  - Disable attacker access points (backdoors, persistence mechanisms)
Configuration Verification:
- Verify system is hardened per baseline
- Check security settings (firewall, antivirus, logging)
- Verify no unnecessary services enabled

System Validation:

Security Scanning:
- Vulnerability scan (missing patches, misconfigurations)
- Malware scan (antivirus, malware bytes)
- Port scan (open ports, unexpected services)
- Log review (no suspicious activity post-remediation)
Functional Testing:
- Critical workflows: Can the system do its job?
- Data integrity: Are backups complete and correct?
- Connectivity: Does system connect to other systems as expected?
- Performance: Is system performing at baseline?
Forensic Validation:
- Re-image remediated system; compare to forensic image
- Verify attacker artifacts are gone
- Verify clean files are restored correctly

Testing & Validation:

Unit Testing: Test individual system functions
Integration Testing: Test system interaction with other systems
End-to-End Testing: Test critical user workflows
Load Testing: Verify system can handle normal load
Security Testing: Penetration test to verify vulnerabilities fixed
Regression Testing: Ensure remediation didn't break anything else
Document test results; identify any issues

Production Resumption:

Staged Rollout: Resume gradually (not all at once)
- Internal testing first, then pilot users, then general availability
- Monitor for issues; rollback if problems arise
Communication: Notify customers/stakeholders of resumption
- Incident is over; systems are restored; here's what we learned
Credentials Reset: Require password reset for users with potentially compromised credentials
Monitoring: Enhanced monitoring post-recovery (see Anti-Patterns)

Post-Recovery Verification:

Monitoring Period: 30-90 days of enhanced monitoring post-recovery
- Alert on suspicious activity (re-infection attempts, similar attack patterns)
- Alert on re-use of attacker-associated IPs/domains
Indicators of Compromise (IoCs): Hunt for IoCs (file hashes, IPs, domains) across infrastructure
- May find additional compromised systems
Security Improvements: Deploy mitigations identified in RCA
- Patch vulnerability, improve monitoring, strengthen access control
Lessons Learned: Document what went well, what didn't; improve processes

Anti-Patterns

Resuming production before validation (re-infection risk); verify systems are clean before resuming

Restoring from infected backups; ensure backup pre-dates compromise

No enhanced monitoring post-recovery (re-infection undetected); monitor closely for 30-90 days

Resuming all at once (overwhelming support if issues); staged rollout allows rapid rollback

No functional testing (system broken post-remediation); test critical workflows before general resumption

Context

Domain Context

Recovery Strategies: Clean rebuild (safest), surgical remediation (faster but riskier), hybrid (critical systems rebuilt, others patched)

Verification: Before declaring recovery complete, verify systems are clean (scans, forensics, monitoring)

Testing: Test critical workflows before resuming production

Monitoring: Enhanced monitoring post-recovery for signs of re-infection

Instructions

Plan Recovery Strategy:

Full Rebuild (Recommended):
- Wipe disk; restore from clean backup or redeploy from Infrastructure as Code
- Most thorough; requires pre-built baselines and good backups
- Time: Hours to days depending on system complexity
Surgical Remediation:
- Patch vulnerability; remove attacker tools/accounts; harden
- Faster but risks incomplete remediation
- Only if confident in scope analysis
Hybrid Approach:
- Critical systems: full rebuild
- Non-critical: surgical remediation
Decision: Based on incident severity, system importance, available baselines/backups

Execute Rebuild/Remediation:

Pre-Remediation:
- Extract forensic evidence (memory dump, disk image)
- Document current state (screenshots, logs, configuration)
Remediation:
- If full rebuild:
  - Deploy from clean baseline (VM template, Infrastructure as Code)
  - Restore data from clean backup (before compromise)
  - Re-apply patches (to prevent re-exploitation)
- If surgical:
  - Apply patch to fix vulnerability
  - Remove attacker-installed tools, files, accounts
  - Change credentials (passwords, keys)
  - Disable attacker access points (backdoors, persistence mechanisms)
Configuration Verification:
- Verify system is hardened per baseline
- Check security settings (firewall, antivirus, logging)
- Verify no unnecessary services enabled

System Validation:

Security Scanning:
- Vulnerability scan (missing patches, misconfigurations)
- Malware scan (antivirus, malware bytes)
- Port scan (open ports, unexpected services)
- Log review (no suspicious activity post-remediation)
Functional Testing:
- Critical workflows: Can the system do its job?
- Data integrity: Are backups complete and correct?
- Connectivity: Does system connect to other systems as expected?
- Performance: Is system performing at baseline?
Forensic Validation:
- Re-image remediated system; compare to forensic image
- Verify attacker artifacts are gone
- Verify clean files are restored correctly

Testing & Validation:

Unit Testing: Test individual system functions
Integration Testing: Test system interaction with other systems
End-to-End Testing: Test critical user workflows
Load Testing: Verify system can handle normal load
Security Testing: Penetration test to verify vulnerabilities fixed
Regression Testing: Ensure remediation didn't break anything else
Document test results; identify any issues

Production Resumption:

Staged Rollout: Resume gradually (not all at once)
- Internal testing first, then pilot users, then general availability
- Monitor for issues; rollback if problems arise
Communication: Notify customers/stakeholders of resumption
- Incident is over; systems are restored; here's what we learned
Credentials Reset: Require password reset for users with potentially compromised credentials
Monitoring: Enhanced monitoring post-recovery (see Anti-Patterns)

Post-Recovery Verification:

Monitoring Period: 30-90 days of enhanced monitoring post-recovery
- Alert on suspicious activity (re-infection attempts, similar attack patterns)
- Alert on re-use of attacker-associated IPs/domains
Indicators of Compromise (IoCs): Hunt for IoCs (file hashes, IPs, domains) across infrastructure
- May find additional compromised systems
Security Improvements: Deploy mitigations identified in RCA
- Patch vulnerability, improve monitoring, strengthen access control
Lessons Learned: Document what went well, what didn't; improve processes

Anti-Patterns

Resuming production before validation (re-infection risk); verify systems are clean before resuming

Restoring from infected backups; ensure backup pre-dates compromise

No enhanced monitoring post-recovery (re-infection undetected); monitor closely for 30-90 days

Resuming all at once (overwhelming support if issues); staged rollout allows rapid rollback

No functional testing (system broken post-remediation); test critical workflows before general resumption

recovery-procedures

recovery-procedures

Recovery Procedures

Context

Domain Context

Instructions

Anti-Patterns

Further Reading

Recovery Procedures

Context

Domain Context

Instructions

Anti-Patterns

Further Reading