Help us improve
Share bugs, ideas, or general feedback.
From godmode
Designs backup and disaster recovery plans with data asset inventory, RPO/RTO targets, tiered strategies for databases/storage/logs, integrity checks via SQL, verification schedules, and recovery runbooks.
npx claudepluginhub arbazkhan971/godmodeHow this skill is triggered — by the user, by Claude, or both
Slash command
/godmode:backupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User invokes `/godmode:backup`
Plan backups, define RPO/RTO targets, design backup architecture, and run disaster recovery drills. Triggers on backup/restore, DR planning, or gaps discovered during incidents.
Implements backup strategies for databases, filesystems, and cloud resources using tar, rsync, pg_dump, AWS S3. Automates scheduling, retention, encryption, verification, and disaster recovery.
Provides backup strategies and disaster recovery for PostgreSQL, MySQL, MongoDB, Redis with full/incremental/PITR options, verification scripts, and RPO/RTO planning. Use for implementing backups or troubleshooting recovery.
Share bugs, ideas, or general feedback.
/godmode:backup/godmode:infra provisions stateful services (databases, storage, queues)Identify all data that needs protection:
DATA ASSET INVENTORY:
| Asset | Type | Size | Growth | Critical |
Set recovery objectives for each data tier:
RECOVERY OBJECTIVES:
| Data Tier | RPO | RTO | Justification |
Design backup approach for each data tier:
TIER 1 BACKUP STRATEGY:
Primary database:
TIER 2 BACKUP STRATEGY:
File uploads (S3/GCS):
TIER 3 BACKUP STRATEGY:
Application logs:
IF backup age >24h: trigger immediate backup. IF restore test fails: alert and re-run.
Automated checks that backups are actually working:
BACKUP VERIFICATION SCHEDULE:
| Check | Frequency | Method | Alert |
### Step 5: Data Integrity Verification
Verify backed-up data is consistent and usable:
DATA INTEGRITY CHECKS:
| Check | Method | Frequency |
|---|---|---|
| Row count consistency | Compare source vs | After each |
| restored backup | restore test | |
| Checksum verification | SHA-256 of backup | Every backup |
| file | ||
| Foreign key integrity | Run FK constraint | Weekly |
| check on restored DB | restore | |
| Application smoke test | Run app against | Monthly |
| restored DB | restore | |
| Point-in-time accuracy | Restore to specific | Quarterly |
| time, verify records | ||
| Cross-region consistency | Compare checksums | Weekly |
| across regions |
Integrity verification queries:
-- Row count comparison (run against source and restored)
SELECT table_name, n_live_tup
FROM pg_stat_user_tables
### Step 6: Recovery Procedures
Document step-by-step recovery for each failure scenario:
RECOVERY: Primary Database Failure Severity: CRITICAL RPO: < 1 minute (via streaming replication) RTO: < 15 minutes (via automated failover)
AUTOMATED RESPONSE:
MANUAL RESPONSE (if automated failover fails):
POST-RECOVERY:
#### Scenario 2: Data Corruption
RECOVERY: Data Corruption — Severity: HIGH, RTO: 15min-2h
#### Scenario 3: Complete Region Failure
RECOVERY: Region Failure — Severity: CRITICAL, RPO: <1min, RTO: <30min
### Step 7: Disaster Recovery Runbook
Generate a comprehensive runbook document:
DISASTER RECOVERY RUNBOOK Last tested: Next test scheduled: Owner: <team/person> On-call escalation: Recovery objectives:
| Tier 1 RPO: < 1 min | RTO: < 15 min |
|---|---|
| Tier 2 RPO: < 1 hour | RTO: < 1 hour |
| Tier 3 RPO: < 24 hour | RTO: < 4 hours |
| Scenarios covered: |
### Step 8: Backup & DR Report
BACKUP & DISASTER RECOVERY REPORT Data assets inventoried: Backup strategies defined: Recovery procedures documented: scenarios Coverage: Tier 1 (critical): <PROTECTED | GAPS | UNPROTECTED> Tier 2 (important): <PROTECTED | GAPS | UNPROTECTED> Tier 3 (operational): <PROTECTED | GAPS | UNPROTECTED> Verification: Automated checks: <CONFIGURED | PARTIAL | MISSING> Last restore test: <date | NEVER> Last DR test: <date | NEVER> Gaps identified:
Verdict: <PROTECTED | PARTIAL | AT RISK>
### Step 9: Commit and Transition
Save runbook as `docs/dr/<date>-disaster-recovery-runbook.md`
```bash
# Test backup and restore procedures
pg_dump -Fc mydb > backup_test.dump
pg_restore -d mydb_test backup_test.dump
psql mydb_test -c "SELECT count(*) FROM users;"
# Verify backup and test restore
pg_dump --format=custom -f backup.dump $DATABASE_URL
pg_restore --list backup.dump
curl -s http://localhost:8080/health
1. Data stores: grep for postgres, mysql, mongodb, redis connection strings
2. Object storage: grep for S3, GCS, Azure Blob configs
3. Existing backups: check crontab, CI jobs, WAL archiving configs
4. No backups → CRITICAL gap. No verification → HIGH gap.
BACKUP VERIFICATION LOOP:
current_iteration = 0
max_iterations = 10
gaps_remaining = total_gaps_found
WHILE gaps_remaining > 0 AND current_iteration < max_iterations:
current_iteration += 1
1. SELECT highest-severity backup gap
2. IMPLEMENT fix:
- Missing backup → create backup job/config
- No verification → add automated integrity check
- No cross-region → configure replication
- No restore test → create and run restore test
3. git commit: "backup: fix <gap> (iter {current_iteration})"
4. VERIFY the fix:
- Backup job runs successfully
- Backup file is valid (checksum, header check)
- Restore test passes (if applicable)
5. IF verification fails:
- Debug configuration
- Retry with adjusted parameters
6. UPDATE gaps_remaining
IF current_iteration % 3 == 0:
PRINT STATUS:
"Iteration {current_iteration}/{max_iterations}"
"Gaps fixed: {total_gaps - gaps_remaining}/{total_gaps}"
"Tier 1 coverage: {tier1_status}"
"Tier 2 coverage: {tier2_status}"
"Last restore test: {last_restore_result}"
Never ask to continue. Loop autonomously until all backup gaps are resolved or budget exhausted.
MECHANICAL CONSTRAINTS — NON-NEGOTIABLE:
1. NEVER treat a backup as valid until a restore test has succeeded.
2. NEVER store backups in the same failure domain as production (same region, same account).
3. ENCRYPT EVERY backup at rest — no exceptions for any data tier.
4. EVERY backup job MUST alert on failure — silent backup failures are the worst kind.
5. EVERY backup MUST have a TTL/retention policy — no infinite storage growth.
6. DEFINE RPO and RTO BEFORE designing backup strategy — business drives engineering.
7. git commit backup configurations BEFORE testing — baseline for debugging.
8. Automatic revert on regression: if backup config change causes production issues, revert immediately.
9. NEVER skip quarterly DR tests — schedule them and treat them as P1 obligations.
10. Log all backup operations in TSV:
TIMESTAMP\tASSET\tOPERATION\tSIZE\tDURATION\tSTATUS\tCHECKSUM
Print on completion: Backup: {asset_count} assets covered. RPO: {rpo}. RTO: {rto}. Last restore test: {last_test_date}. Encryption: {encryption_status}. Cross-region: {cross_region}. Verdict: {verdict}.
timestamp asset operation size duration_s status checksum
2024-01-15T03:00:00Z postgres-prod backup 12GB 180 success sha256:abc123
2024-01-15T03:05:00Z redis-prod backup 2GB 30 success sha256:def456
2024-01-15T04:00:00Z postgres-prod restore-test 12GB 300 success verified
Columns: timestamp, asset, operation(backup/restore-test/dr-drill), size, duration_s, status(success/failed/partial), checksum.
## Keep/Discard
KEEP if: improvement verified. DISCARD if: regression or no change. Revert discards immediately.
## Stop Conditions
Stop when: target reached, budget exhausted, or >5 consecutive discards.