Backup and disaster recovery skill. Activates when user needs to design backup strategies, define RPO/RTO targets, test recovery procedures, verify data integrity, or generate disaster recovery runbooks. Produces comprehensive backup plans with automated verification and tested recovery procedures. Triggers on: /godmode:backup, "backup strategy", "disaster recovery", "what's our RPO?", "can we recover from", or when designing critical data infrastructure.
From godmodenpx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:backup/godmode:infra provisions stateful services (databases, storage, queues)Identify all data that needs protection:
DATA ASSET INVENTORY:
| Asset | Type | Size | Growth | Critical |
Set recovery objectives for each data tier:
RECOVERY OBJECTIVES:
| Data Tier | RPO | RTO | Justification |
Design backup approach for each data tier:
TIER 1 BACKUP STRATEGY:
Primary database:
TIER 2 BACKUP STRATEGY:
File uploads (S3/GCS):
TIER 3 BACKUP STRATEGY:
Application logs:
IF backup age >24h: trigger immediate backup. IF restore test fails: alert and re-run.
Automated checks that backups are actually working:
BACKUP VERIFICATION SCHEDULE:
| Check | Frequency | Method | Alert |
### Step 5: Data Integrity Verification
Verify backed-up data is consistent and usable:
DATA INTEGRITY CHECKS:
| Check | Method | Frequency |
|---|---|---|
| Row count consistency | Compare source vs | After each |
| restored backup | restore test | |
| Checksum verification | SHA-256 of backup | Every backup |
| file | ||
| Foreign key integrity | Run FK constraint | Weekly |
| check on restored DB | restore | |
| Application smoke test | Run app against | Monthly |
| restored DB | restore | |
| Point-in-time accuracy | Restore to specific | Quarterly |
| time, verify records | ||
| Cross-region consistency | Compare checksums | Weekly |
| across regions |
Integrity verification queries:
-- Row count comparison (run against source and restored)
SELECT table_name, n_live_tup
FROM pg_stat_user_tables
### Step 6: Recovery Procedures
Document step-by-step recovery for each failure scenario:
RECOVERY: Primary Database Failure Severity: CRITICAL RPO: < 1 minute (via streaming replication) RTO: < 15 minutes (via automated failover)
AUTOMATED RESPONSE:
MANUAL RESPONSE (if automated failover fails):
POST-RECOVERY:
#### Scenario 2: Data Corruption
RECOVERY: Data Corruption — Severity: HIGH, RTO: 15min-2h
#### Scenario 3: Complete Region Failure
RECOVERY: Region Failure — Severity: CRITICAL, RPO: <1min, RTO: <30min
### Step 7: Disaster Recovery Runbook
Generate a comprehensive runbook document:
DISASTER RECOVERY RUNBOOK Last tested: <date> Next test scheduled: <date> Owner: <team/person> On-call escalation: <contact chain> Recovery objectives:
| Tier 1 RPO: < 1 min | RTO: < 15 min |
|---|---|
| Tier 2 RPO: < 1 hour | RTO: < 1 hour |
| Tier 3 RPO: < 24 hour | RTO: < 4 hours |
| Scenarios covered: |
### Step 8: Backup & DR Report
BACKUP & DISASTER RECOVERY REPORT Data assets inventoried: <N> Backup strategies defined: <N> Recovery procedures documented: <N> scenarios Coverage: Tier 1 (critical): <PROTECTED | GAPS | UNPROTECTED> Tier 2 (important): <PROTECTED | GAPS | UNPROTECTED> Tier 3 (operational): <PROTECTED | GAPS | UNPROTECTED> Verification: Automated checks: <CONFIGURED | PARTIAL | MISSING> Last restore test: <date | NEVER> Last DR test: <date | NEVER> Gaps identified: <N>
Verdict: <PROTECTED | PARTIAL | AT RISK>
### Step 9: Commit and Transition
Save runbook as `docs/dr/<date>-disaster-recovery-runbook.md`
```bash
# Test backup and restore procedures
pg_dump -Fc mydb > backup_test.dump
pg_restore -d mydb_test backup_test.dump
psql mydb_test -c "SELECT count(*) FROM users;"
# Verify backup and test restore
pg_dump --format=custom -f backup.dump $DATABASE_URL
pg_restore --list backup.dump
curl -s http://localhost:8080/health
1. Data stores: grep for postgres, mysql, mongodb, redis connection strings
2. Object storage: grep for S3, GCS, Azure Blob configs
3. Existing backups: check crontab, CI jobs, WAL archiving configs
4. No backups → CRITICAL gap. No verification → HIGH gap.
BACKUP VERIFICATION LOOP:
current_iteration = 0
max_iterations = 10
gaps_remaining = total_gaps_found
WHILE gaps_remaining > 0 AND current_iteration < max_iterations:
current_iteration += 1
1. SELECT highest-severity backup gap
2. IMPLEMENT fix:
- Missing backup → create backup job/config
- No verification → add automated integrity check
- No cross-region → configure replication
- No restore test → create and run restore test
3. git commit: "backup: fix <gap> (iter {current_iteration})"
4. VERIFY the fix:
- Backup job runs successfully
- Backup file is valid (checksum, header check)
- Restore test passes (if applicable)
5. IF verification fails:
- Debug configuration
- Retry with adjusted parameters
6. UPDATE gaps_remaining
IF current_iteration % 3 == 0:
PRINT STATUS:
"Iteration {current_iteration}/{max_iterations}"
"Gaps fixed: {total_gaps - gaps_remaining}/{total_gaps}"
"Tier 1 coverage: {tier1_status}"
"Tier 2 coverage: {tier2_status}"
"Last restore test: {last_restore_result}"
Never ask to continue. Loop autonomously until all backup gaps are resolved or budget exhausted.
MECHANICAL CONSTRAINTS — NON-NEGOTIABLE:
1. NEVER treat a backup as valid until a restore test has succeeded.
2. NEVER store backups in the same failure domain as production (same region, same account).
3. ENCRYPT EVERY backup at rest — no exceptions for any data tier.
4. EVERY backup job MUST alert on failure — silent backup failures are the worst kind.
5. EVERY backup MUST have a TTL/retention policy — no infinite storage growth.
6. DEFINE RPO and RTO BEFORE designing backup strategy — business drives engineering.
7. git commit backup configurations BEFORE testing — baseline for debugging.
8. Automatic revert on regression: if backup config change causes production issues, revert immediately.
9. NEVER skip quarterly DR tests — schedule them and treat them as P1 obligations.
10. Log all backup operations in TSV:
TIMESTAMP\tASSET\tOPERATION\tSIZE\tDURATION\tSTATUS\tCHECKSUM
Print on completion: Backup: {asset_count} assets covered. RPO: {rpo}. RTO: {rto}. Last restore test: {last_test_date}. Encryption: {encryption_status}. Cross-region: {cross_region}. Verdict: {verdict}.
timestamp asset operation size duration_s status checksum
2024-01-15T03:00:00Z postgres-prod backup 12GB 180 success sha256:abc123
2024-01-15T03:05:00Z redis-prod backup 2GB 30 success sha256:def456
2024-01-15T04:00:00Z postgres-prod restore-test 12GB 300 success verified
Columns: timestamp, asset, operation(backup/restore-test/dr-drill), size, duration_s, status(success/failed/partial), checksum.
## Keep/Discard
KEEP if: improvement verified. DISCARD if: regression or no change. Revert discards immediately.
## Stop Conditions
Stop when: target reached, budget exhausted, or >5 consecutive discards.