Help us improve
Share bugs, ideas, or general feedback.
Define recovery objectives (RTO/RPO), backup strategies, failover procedures, and testing protocols. Use when planning disaster recovery or establishing continuity practices.
npx claudepluginhub sethdford/claude-skills --plugin architect-infrastructure-designHow this skill is triggered — by the user, by Claude, or both
Slash command
/infrastructure-design:disaster-recovery-planThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design recovery strategies with defined objectives, tested procedures, and regular validation.
Plan backups, define RPO/RTO targets, design backup architecture, and run disaster recovery drills. Triggers on backup/restore, DR planning, or gaps discovered during incidents.
Writes a complete disaster recovery plan for a service or system covering RPO/RTO targets, failure runbooks, backup/restore procedures, testing schedule, and communication templates.
Designs disaster recovery plans for cloud infrastructure with RTO/RPO targets, multi-region failover, Terraform standby resources, database replication, failover scripts, and runbooks.
Share bugs, ideas, or general feedback.
Design recovery strategies with defined objectives, tested procedures, and regular validation.
You are planning disaster recovery. Define RTO/RPO requirements, design backup and failover strategies, plan testing. Read business impact analysis, current backups, and regulatory requirements.
Based on IT disaster recovery best practices (NIST, ISO 27031):
Define Business Requirements: For each critical system, what's RTO (max downtime) and RPO (max data loss)? Business impact: lost revenue, SLA violations, customer trust?
Design Backup Strategy: Full daily backup + hourly incremental. Or continuous replication for stricter RPO. Test recovery from backups monthly; document recovery steps.
Plan Failover: For RTO < 1 hour, set up active-passive (standby system). For RTO < 5 minutes, active-active (both systems live). Implement health checks and automatic failover.
Document Procedures: Who decides to failover? What are manual steps? How do you know failover succeeded? Test documentation with dry runs; update after each test.
Schedule Regular Testing: Monthly failover drills for critical systems. Test both planned (maintenance window) and unplanned (kill production server) scenarios. Document findings and improvements.