Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

disaster-recovery-plan | infrastructure-design | ClaudePluginHub

Skill

disaster-recovery-plan

From infrastructure-design

Define recovery objectives (RTO/RPO), backup strategies, failover procedures, and testing protocols. Use when planning disaster recovery or establishing continuity practices.

$

npx claudepluginhub sethdford/claude-skills --plugin architect-infrastructure-design

Popularity

Parent stars

13

Parent forks

2

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/infrastructure-design:disaster-recovery-plan

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Design recovery strategies with defined objectives, tested procedures, and regular validation.

SKILL.md

48 lines · ~769 tokens

Similar Skills

backup-and-disaster-recovery

274

Plan backups, define RPO/RTO targets, design backup architecture, and run disaster recovery drills. Triggers on backup/restore, DR planning, or gaps discovered during incidents.

1 file

rampstack-skills

disaster-recovery-plan

876

Writes a complete disaster recovery plan for a service or system covering RPO/RTO targets, failure runbooks, backup/restore procedures, testing schedule, and communication templates.

planning-disaster-recovery

2.2k

Designs disaster recovery plans for cloud infrastructure with RTO/RPO targets, multi-region failover, Terraform standby resources, database replication, failover scripts, and runbooks.

3 files8 tools

disaster-recovery-planner

Stats

Parent stars13

Parent forks2

MaintenanceFair

Last CommitMar 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Disaster Recovery Plan

Design recovery strategies with defined objectives, tested procedures, and regular validation.

Context

You are planning disaster recovery. Define RTO/RPO requirements, design backup and failover strategies, plan testing. Read business impact analysis, current backups, and regulatory requirements.

Domain Context

Based on IT disaster recovery best practices (NIST, ISO 27031):

RTO (Recovery Time Objective): How long can system be down? 1 hour? 1 day? Determines failover strategy.
RPO (Recovery Point Objective): How much data loss acceptable? 1 hour? 1 day? Determines backup frequency.
Backup Strategies: Full (complete copy), incremental (only changes since last backup), continuous replication
Failover: Automatic (heartbeat-driven) vs manual (operations-triggered). Planned vs unplanned.
Testing: Regular drills validate procedures; practice before disaster strikes

Instructions

Define Business Requirements: For each critical system, what's RTO (max downtime) and RPO (max data loss)? Business impact: lost revenue, SLA violations, customer trust?
Design Backup Strategy: Full daily backup + hourly incremental. Or continuous replication for stricter RPO. Test recovery from backups monthly; document recovery steps.
Plan Failover: For RTO < 1 hour, set up active-passive (standby system). For RTO < 5 minutes, active-active (both systems live). Implement health checks and automatic failover.
Document Procedures: Who decides to failover? What are manual steps? How do you know failover succeeded? Test documentation with dry runs; update after each test.
Schedule Regular Testing: Monthly failover drills for critical systems. Test both planned (maintenance window) and unplanned (kill production server) scenarios. Document findings and improvements.

Anti-Patterns

RTO/RPO Undefined: Assume everything needs sub-minute RTO. Result: over-engineering cost. Guard: Quantify business impact; set RTO/RPO accordingly per system.
Backups Never Tested: Assume they work. Result: discover failures during actual disaster. Guard: Regular restore drills; track recovery metrics.
Manual Failover for Low RTO: Plan manual process for 5-minute RTO. Result: missed SLA. Guard: Automate failover if RTO tight; test automation regularly.
Ignoring Data Consistency in Failover: Assume data identical between systems. Result: data loss or corruption. Guard: Validate data integrity post-failover; have reconciliation procedure.

Further Reading

Site Reliability Engineering by Google — production disaster recovery practices
Disaster Recovery Planning by Salvatore D. Ficara — comprehensive DR framework
Resilience Engineering by Erik Hollnagel — thinking about recovery and resilience