- Users report incorrect data
Provides step-by-step guidance for diagnosing and recovering from database corruption. Covers immediate preservation of evidence, restoration from backups, point-in-time recovery, and root cause analysis for disk failures, application bugs, and migration errors.
/plugin marketplace add anton-abyzov/specweave/plugin install sw-infra@specweaveDatabase Integrity Check (PostgreSQL):
-- Check for corruption
SELECT * FROM pg_catalog.pg_database WHERE datname = 'your_database';
-- Verify checksums (if enabled)
SELECT datname, datcollate, datctype
FROM pg_database
WHERE datname = 'your_database';
-- Check for bloat
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Database Integrity Check (MySQL):
-- Check table for corruption
CHECK TABLE users;
-- Repair table (if corrupted)
REPAIR TABLE users;
-- Optimize table (defragment)
OPTIMIZE TABLE users;
Questions to answer:
Check Database Logs:
# PostgreSQL
grep "ERROR\|FATAL\|PANIC" /var/log/postgresql/postgresql.log
# MySQL
grep "ERROR" /var/log/mysql/error.log
# Look for:
# - Constraint violations
# - Foreign key errors
# - Checksum errors
# - Disk I/O errors
Common causes:
| Cause | Symptoms |
|---|---|
| Disk corruption | I/O errors in dmesg, checksum failures |
| Application bug | Logical corruption (wrong data, not random) |
| Failed migration | Schema mismatch, foreign key violations |
| Concurrent writes | Race condition, duplicate records |
| Hardware failure | Random corruption, unrelated records |
| Malicious attack | Deliberate data modification |
Check for Disk Errors:
# Check disk errors
dmesg | grep -i "I/O error\|disk error"
# Check SMART status
smartctl -a /dev/sda
# Look for: Reallocated_Sector_Ct, Current_Pending_Sector
CRITICAL: Preserve Evidence
# 1. STOP ALL WRITES (prevent further corruption)
# Put application in read-only mode OR
# Take application offline
# 2. Snapshot/backup current state (even if corrupted)
# PostgreSQL:
pg_dump your_database > /backup/corrupted-$(date +%Y%m%d-%H%M%S).sql
# MySQL:
mysqldump your_database > /backup/corrupted-$(date +%Y%m%d-%H%M%S).sql
# 3. Snapshot disk (cloud)
# AWS:
aws ec2 create-snapshot --volume-id vol-1234567890abcdef0 --description "Corruption snapshot"
# Impact: Preserves evidence for forensics
# Risk: None (read-only operations)
CRITICAL: DO NOT:
Option A: Restore from Backup (if recent clean backup)
# 1. Identify last known good backup
ls -lh /backup/ | grep pg_dump
# Example:
# backup-20251026-0200.sql ← Clean backup (before corruption)
# backup-20251026-0800.sql ← Corrupted
# 2. Restore from clean backup
# PostgreSQL:
psql your_database < /backup/backup-20251026-0200.sql
# MySQL:
mysql your_database < /backup/backup-20251026-0200.sql
# 3. Verify data integrity
# Run application tests
# Check user-reported issues
# Impact: Data restored to clean state
# Risk: Medium (lose data after backup time)
Option B: Repair Corrupted Records (if isolated corruption)
-- Identify corrupted records
SELECT * FROM users WHERE email IS NULL; -- Should not be null
-- Fix corrupted records
UPDATE users SET email = 'unknown@example.com' WHERE email IS NULL;
-- Verify fix
SELECT count(*) FROM users WHERE email IS NULL; -- Should be 0
-- Impact: Corruption fixed
-- Risk: Low (if corruption is known and fixable)
Option C: Point-in-Time Recovery (PostgreSQL)
# If WAL (Write-Ahead Logging) enabled:
# 1. Determine recovery point (before corruption)
# 2025-10-26 07:00:00 (corruption detected at 08:00)
# 2. Restore from base backup + WAL
pg_basebackup -D /var/lib/postgresql/data-recovery
# 3. Configure recovery.conf
# recovery_target_time = '2025-10-26 07:00:00'
# 4. Start PostgreSQL (will replay WAL until target time)
systemctl start postgresql
# Impact: Restore to exact point before corruption
# Risk: Low (if WAL available)
Root Cause Analysis:
If disk corruption:
If application bug:
If failed migration:
If concurrent writes:
Backups:
Monitoring:
Data Validation:
Redundancy:
Escalate to DBA if:
Escalate to developer if:
Escalate to security team if:
Escalate to management if:
If critical data corrupted:
After resolving:
# PostgreSQL integrity check
psql -c "SELECT * FROM pg_catalog.pg_database"
# MySQL table check
mysqlcheck -c your_database
# Backup
pg_dump your_database > backup.sql
mysqldump your_database > backup.sql
# Restore
psql your_database < backup.sql
mysql your_database < backup.sql
# Disk check
dmesg | grep -i "I/O error"
smartctl -a /dev/sda
fsck /dev/sda1
# Snapshot (AWS)
aws ec2 create-snapshot --volume-id vol-1234567890abcdef0
Designs feature architectures by analyzing existing codebase patterns and conventions, then providing comprehensive implementation blueprints with specific files to create/modify, component designs, data flows, and build sequences