Database design — schema patterns, indexing, partitioning, replication, migration, performance tuning. Use when the user asks to "design the database schema", "plan indexing strategy", "set up replication", "partition large tables", "migrate database schema", "tune query performance", or mentions normalization, sharding, B-tree indexes, zero-downtime migration, or connection pooling.
From pmnpx claudepluginhub javimontano/mao-pm-apexThis skill is limited to using the following tools:
examples/README.mdexamples/sample-output.htmlexamples/sample-output.mdprompts/metaprompts.mdprompts/use-case-prompts.mdreferences/body-of-knowledge.mdreferences/database-patterns.mdreferences/knowledge-graph.mmdreferences/state-of-the-art.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Alcance: Este skill aplica a
{TIPO_SERVICIO}=SDAy{TIPO_SERVICIO}=Data-AI. El diseño de bases de datos es relevante para desarrollo de software y para estrategia de datos. Para otros contextos de datos, consultedata-engineering,data-governance, obi-architecture.
Database architecture defines how data is structured, stored, accessed, replicated, and evolved over time. This skill produces comprehensive database design documentation covering schema modeling, indexing, partitioning, high availability, migration strategy, and performance tuning.
Un schema sin estrategia de evolución es un schema condenado a romperse. La evolución del esquema se planifica ANTES de la primera migración. Los índices se diseñan a partir de access patterns medidos, no de intuición. Las migraciones se ejecutan con zero-downtime o no se ejecutan. Cada decisión de particionamiento, replicación e indexación lleva justificación cuantitativa y plan de rollback.
The user provides a system or database name as $ARGUMENTS. Parse $1 as the system/database name used throughout all output artifacts.
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 schema design + S2 indexing + S5 migration plan) | técnica (full 6 sections, default)Before generating architecture, detect the data context:
!find . -name "*.sql" -o -name "*.prisma" -o -name "*.schema" -o -name "migrations" -type d | head -20
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/database-patterns.md
Match workload to engine. Default to PostgreSQL unless a specialized need is clear.
| Workload | Engine | Why | When NOT to use |
|---|---|---|---|
| OLTP (transactions, CRUD) | PostgreSQL, MySQL | ACID, mature tooling, JSON support | Write throughput >100K TPS single-node |
| Cache / Session | Redis, Memcached | Sub-ms latency, TTL, pub/sub | Data >RAM, durability required |
| Full-Text Search | Elasticsearch, Meilisearch | Inverted index, relevance scoring | Primary datastore, strong consistency |
| Graph (relationships) | Neo4j, Amazon Neptune | Traversal queries, path finding | Tabular data, simple joins |
| Time-Series | TimescaleDB, ClickHouse | Compression, continuous aggregates | Random-access CRUD |
| Document (flexible schema) | MongoDB, DynamoDB | Schema flexibility, horizontal scale | Complex joins, strict consistency |
| Wide-Column (high write) | Cassandra, ScyllaDB | Linear write scaling, tunable consistency | Ad-hoc queries, strong consistency |
| Vector (embeddings/AI) | pgvector, Pinecone, Qdrant | ANN search, similarity queries | Exact match queries |
HTAP consideration: For mixed OLTP+analytics, evaluate AlloyDB, TiDB, or CockroachDB before splitting into separate engines. Fewer moving parts outweighs theoretical optimization.
Defines the logical and physical data model.
Includes:
Key decisions:
Plans indexes to optimize query patterns without degrading write performance.
Index types and use cases:
Composite indexes: Column order matters -- most selective first for equality, range column last.
Covering indexes: Include non-key columns (INCLUDE clause) to enable index-only scans.
Partial indexes: Filter rows at index creation (WHERE clause) -- smaller index for subset queries.
Index maintenance:
pg_stat_user_indexes.idx_scan = 0 over 30+ days -- drop to reduce write overheadpgstatindex or pgstattuple; REINDEX when bloat exceeds 30%REINDEX CONCURRENTLY (PG 12+) to rebuild without locking reads/writesmaintenance_work_mem to 512MB-2GB for faster rebuilds during low-traffic windowsKey decisions:
EXPLAIN ANALYZE: sequential scan vs index scan vs bitmap scanStrategies for splitting data across partitions or physical nodes.
Partitioning (single node):
Sharding (multi-node):
Cross-shard considerations:
Replication topology, failover strategy, MVCC tuning, and recovery objectives.
Replication models:
Failover strategy:
Recovery objectives:
MVCC tuning (PostgreSQL):
autovacuum_vacuum_scale_factor = 0.01 (1% vs default 20%) for tables >1M rowsautovacuum_vacuum_threshold = 50 (vs default 50) combined with lower scale_factor triggers vacuum earlieridle_in_transaction_session_timeout = 30s to prevent long transactions from blocking vacuumpg_replication_slots.confirmed_flush_lsn lagpgstattuple or pgmetricsRead replica management:
pg_stat_replication.replay_lagpg_last_xact_replay_timestamp() before routingSchema versioning, zero-downtime migrations, and rollback strategy.
Zero-downtime migration patterns:
| Pattern | Mechanism | Use When |
|---|---|---|
| Expand-contract | Add new column/table, backfill, switch reads, drop old | Column renames, type changes |
| Dual-write | Write to old and new schema simultaneously during transition | Table restructuring, cross-service migrations |
| Shadow columns | Add new column, populate via trigger/backfill, swap at cutover | Non-nullable column additions |
| Ghost tables | Shadow table + trigger sync + atomic rename (gh-ost, pt-osc) | Large table ALTERs in MySQL |
Migration framework:
Data backfill:
Rollback strategy:
Query optimization, connection management, caching, and monitoring.
Connection pool sizing:
max_connections = (core_count * 2) + effective_spindle_count (typically 20-50 for SSD)default_pool_size = application instances * threads per instance, capped at PostgreSQL max_connections minus 10% overheadpg_stat_activity wait_event analysisidle_in_transaction_session_timeout = 30s; alert on connections idle >5 minQuery optimization:
EXPLAIN ANALYZE for execution plans (sequential scan, nested loop, hash join)Memory configuration:
shared_buffers: 25-40% of total system memorywork_mem: 4-64MB per sort/hash operation (multiply by max_connections for total impact)effective_cache_size: 50-75% of total memory (planner hint, not allocation)maintenance_work_mem: 512MB-2GB for VACUUM, CREATE INDEX, REINDEXCaching strategy:
Monitoring thresholds:
| Decision | Enables | Constrains | When to Use |
|---|---|---|---|
| 3NF Normalization | Data integrity, no anomalies | More joins, slower reads | Transactional systems, accuracy critical |
| Strategic Denormalization | Faster reads, simpler queries | Update anomalies, storage | Read-heavy, reporting, materialized views |
| Composite Indexes | Multi-column query optimization | Write overhead, storage | Frequent multi-column WHERE/ORDER BY |
| Range Partitioning | Fast time-range queries, pruning | Cross-partition queries slower | Time-series, log data, append-heavy |
| Horizontal Sharding | Linear write scaling | Cross-shard complexity | Beyond single-node write capacity |
| Sync Replication | Zero data loss (RPO=0) | Higher write latency | Financial, compliance-critical |
| Async Replication | Low write latency, read scaling | Potential data loss on failover | Most web apps, analytics replicas |
| Caso | Estrategia de Manejo |
|---|---|
| Base de datos legacy sin documentacion (triggers ocultos, stored procedures sin versionado) | Reverse-engineer desde information_schema y pg_stat_statements. Documentar as-is antes de proponer cambios. Agregar 30-50% overhead a estimaciones de migracion. |
| Multi-tenant con >1000 tenants y tablas >1TB | Evaluar shard-by-tenant con consistent hashing. Implementar RLS para aislamiento en shared-schema. Monitorear noisy-neighbor con metricas per-tenant. |
| Migracion zero-downtime en tablas con >100M filas | Usar expand-contract con backfill en batches de 5K-10K filas. Shadow columns con triggers de sync. Validar checksums antes de cutover. Definir ventana de rollback de 48h. |
| Requisitos regulatorios cruzados (GDPR + HIPAA + data residency) | Replicacion restringida por geografia. Soft-delete con audit trail para right-to-erasure. TDE + TLS obligatorio. Separar PII en esquemas dedicados con acceso auditado. |
| Decision | Alternativa Descartada | Justificacion |
|---|---|---|
| Backward-compatible migrations con expand-contract | ALTER directo en tabla activa | ALTER bloquea lecturas/escrituras en tablas grandes. Expand-contract permite zero-downtime con rollback seguro. |
| BRIN indexes para tablas time-series en vez de B-tree | B-tree convencional | BRIN ocupa 1000x menos espacio para datos naturalmente ordenados. B-tree innecesario cuando las queries siempre filtran por rango temporal. |
| PgBouncer transaction-mode en vez de session-mode | Session pooling | Transaction-mode permite reutilizar conexiones entre requests, soportando mas usuarios concurrentes con menos conexiones al backend. Session-mode desperdicia conexiones en idle. |
graph TD
subgraph Core
DB[database-architecture]
end
subgraph Inputs
REQ[Access Patterns & Query Workloads] --> DB
VOL[Data Volume & Growth Projections] --> DB
SLA[RPO/RTO Requirements] --> DB
end
subgraph Outputs
DB --> SCH[Schema Design & ER Model]
DB --> IDX[Indexing Strategy Report]
DB --> MIG[Migration Plan & Scripts]
DB --> REP[Replication Topology]
end
subgraph Related Skills
DB -.-> DE[data-engineering]
DB -.-> DG[data-governance]
DB -.-> SA[software-architecture]
DB -.-> IA[infrastructure-architecture]
end
Formato MD (default):
# Database Architecture: {system_name}
## S1: Schema Design & Modeling
- ER diagram (Mermaid)
- Normalization analysis
- Constraint catalog
## S2: Indexing Strategy
- Index recommendations table
- EXPLAIN ANALYZE evidence
## S3-S6: [remaining sections]
## Anexos: DDL scripts, migration files
Formato XLSX (secondary):
Formato HTML (bajo demanda):
A-01_Database_Architecture_{system}_{WIP}.htmlFormato DOCX (circulación formal):
{fase}_{entregable}_{cliente}_{WIP}.docxFormato PPTX (bajo demanda):
{fase}_{entregable}_{cliente}_{WIP}.pptx| Dimension | Peso | Criterio | Umbral Minimo |
|---|---|---|---|
| Trigger Accuracy | 10% | El skill se activa correctamente ante menciones de schema, indexing, partitioning, replication, migration, tuning | 7/10 |
| Completeness | 25% | Las 6 secciones cubren el dominio completo con profundidad proporcional al contexto | 7/10 |
| Clarity | 20% | Decisiones documentadas con justificacion cuantitativa, sin ambiguedad en recommendations | 7/10 |
| Robustness | 20% | Edge cases cubiertos, rollback strategy para cada migracion, monitoring thresholds definidos | 7/10 |
| Efficiency | 10% | Output generado sin redundancia, reutiliza context del codebase detectado | 7/10 |
| Value Density | 15% | Cada seccion entrega insights accionables (scripts, configs, thresholds), no solo teoria | 7/10 |
Umbral minimo global: 7/10. Deliverables por debajo requieren re-work antes de entrega.
Greenfield with Unknown Query Patterns: Start normalized, add indexes reactively from slow query logs. Avoid premature optimization. Use EXPLAIN ANALYZE early.
Legacy Database with No Documentation: Reverse-engineer from information_schema. Profile active queries via pg_stat_statements to understand actual access patterns. Document as-is before proposing changes.
Multi-Tenant Database: Schema-per-tenant (isolation, migration complexity) vs shared schema with tenant_id (simpler, noisy neighbor risk). Row-level security (RLS) for shared schema. Shard by tenant at scale (>1000 tenants or >1TB).
High-Write Throughput (IoT, Logging): Append-only tables, time-based partitioning, bulk inserts, minimal indexes. Evaluate TimescaleDB or ClickHouse. WAL tuning: wal_buffers = 64MB, checkpoint_completion_target = 0.9.
Regulatory Compliance (GDPR, HIPAA): Data residency constraints affect replication topology. Right-to-erasure requires soft-delete audit or cryptographic erasure. Encryption at rest (TDE) and in transit (TLS) mandatory.
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | Yes | Markdown con Mermaid embebido (ER diagrams, replication topology). |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter.
Primary: A-01_Database_Architecture.html -- ER diagram, schema design, indexing strategy, partitioning plan, replication topology, migration roadmap, performance tuning checklist.
Secondary: Migration scripts (.sql), index recommendation report, slow query analysis, replication monitoring dashboard config.
Autor: Javier Montaño | Última actualización: 12 de marzo de 2026