Skill

database-design

Designs relational and NoSQL database schemas with patterns for normalization/denormalization, entity relationships, indexing, migrations, audit trails, partitioning, and access pattern optimization.

PostgreSQL

MySQL

SQLite

MongoDB

database

npx claudepluginhub krzysztofsurdy/code-virtuoso --plugin playbooks-virtuoso

Tool Access

This skill is limited to using the following tools:

Read Grep Glob Bash

Preview

Good database design determines the long-term maintainability, performance, and correctness of any data-driven application. Schema decisions made early are expensive to reverse later. Every table, column, index, and constraint should exist for a reason backed by access patterns and business rules.

Supporting Assets

references/indexing-strategies.mdreferences/migration-patterns.mdreferences/modeling-patterns.md

SKILL.md

Similar Skills

database-schema-designer

955

Designs scalable SQL and NoSQL database schemas from entity descriptions, with normalization, indexing, constraints, migrations, and performance best practices.

3 files

database-schema-designer

database-design

1.3k

Designs schemas, indexes, query optimizations, and migrations for SQL/NoSQL databases. For table design, N+1 fixes, normalization, ORMs, performance tuning.

project-starter

database-design-patterns

Provides schema design patterns and optimization strategies for relational and NoSQL databases. Use for schema design, query optimization, partitioning, sharding, and scalable persistence layers.

6 files

nickcrew-claude-ctx-plugin

Stats

Stars16

Forks1

Last CommitMar 9, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Database Design

Data Modeling Principles

Start from Access Patterns

Design tables around how the application reads and writes data, not around how entities look in a domain model. Two questions drive every schema decision:

What queries will run most frequently? -- these determine table structure, indexes, and denormalization choices
What consistency guarantees does the data need? -- these determine normalization level, constraints, and transaction boundaries

Entity Relationships

Relationship	Implementation	When to Use
One-to-one	Foreign key with UNIQUE constraint on the child table	Splitting rarely-accessed columns into a separate table, or enforcing exactly-one semantics
One-to-many	Foreign key on the child table referencing the parent	Orders to order items, users to addresses
Many-to-many	Join table with composite primary key	Tags to articles, students to courses
Many-to-many with attributes	Join table with its own columns beyond the two foreign keys	Enrollment with grade, membership with role
Self-referential	Foreign key referencing the same table	Org charts, category trees, threaded comments

Naming Conventions

Consistent naming prevents confusion across teams and tools:

Tables: plural nouns in snake_case (order_items, user_addresses)
Columns: singular snake_case describing the value (created_at, total_amount)
Foreign keys: <referenced_table_singular>_id (user_id, order_id)
Indexes: idx_<table>_<columns> (idx_orders_user_id_created_at)
Constraints: chk_<table>_<rule>, uq_<table>_<columns>, fk_<table>_<referenced>

Normalization vs Denormalization

Normal Forms

Form	Rule	Violation Example
1NF	Every column holds atomic values; no repeating groups	Storing comma-separated tags in a single column
2NF	Every non-key column depends on the entire primary key	In a composite-key table, a column depending on only part of the key
3NF	No non-key column depends on another non-key column	Storing both `city` and `zip_code` when zip determines city
BCNF	Every determinant is a candidate key	A scheduling table where room determines building but room is not a key

When to Denormalize

Normalization prevents anomalies but adds JOINs. Denormalize selectively when:

Read-heavy workloads dominate and JOIN cost is measurable in profiling
Reporting tables need pre-aggregated data that would otherwise require expensive queries
Caching a computed value avoids recalculating on every read (e.g., order_total stored on the order row)
Document-oriented access retrieves an entire aggregate in one read

Rules for safe denormalization:

Always keep the normalized source of truth -- denormalized data is a derived cache
Define how and when the denormalized copy is updated (trigger, application event, batch job)
Monitor for drift between the source and the copy
Document why the denormalization exists and what access pattern it serves

Choosing a Database Type

Type	Strengths	Fits When
Relational (PostgreSQL, MySQL)	ACID transactions, complex queries, mature tooling, JOINs	Structured data with relationships, transactional workloads, most CRUD applications
Document (MongoDB, DynamoDB)	Flexible schema, nested data, horizontal scaling	Aggregates accessed as a unit, rapidly evolving schemas, per-tenant isolation
Key-value (Redis, Memcached)	Sub-millisecond reads, simple data model	Session storage, caching, counters, rate limiting
Column-family (Cassandra, ScyllaDB)	High write throughput, wide rows, linear scaling	Time-series, IoT telemetry, append-heavy workloads
Graph (Neo4j, Neptune)	Traversal queries, relationship-centric data	Social networks, recommendation engines, fraud detection
Time-series (TimescaleDB, InfluxDB)	Optimized for time-stamped data, automatic partitioning	Metrics, monitoring, financial tick data

Polyglot persistence -- using different databases for different parts of the same system -- is valid when access patterns genuinely differ. It is not valid as a way to avoid learning one database well.

Indexing Fundamentals

Indexes accelerate reads at the cost of slower writes and additional storage. Every index must justify its existence through query patterns.

Index Types

Type	Structure	Best For
B-tree	Balanced tree, sorted data	Equality and range queries, ORDER BY, most general-purpose indexing
Hash	Hash table	Exact equality lookups only; no range support
GiST	Generalized search tree	Spatial data, geometric queries, range types, nearest-neighbor
GIN	Generalized inverted index	Full-text search, JSONB containment, array membership
BRIN	Block range index	Large tables with naturally ordered data (timestamps, sequential IDs)

Composite Index Design

The order of columns in a composite index matters. The leftmost prefix rule means a composite index on (a, b, c) supports queries filtering on (a), (a, b), or (a, b, c), but not (b, c) alone.

Column ordering guidelines:

Equality conditions first -- columns compared with =
Range conditions last -- columns compared with >, <, BETWEEN
Most selective column first among equals

Covering and Partial Indexes

Covering index: includes all columns the query needs, so the database reads only the index. Use INCLUDE (PostgreSQL) or just add columns to the index key.
Partial index: indexes only rows matching a condition, reducing size and write overhead. Ideal for querying a small subset of a large table (e.g., WHERE status = 'pending').

See Indexing Strategies Reference for detailed index types, EXPLAIN analysis, and anti-patterns.

Schema Evolution

Schema changes are inevitable. The question is whether they break running applications.

Backward-Compatible Changes (Safe)

Adding a new nullable column
Adding a new table
Adding a new index (may lock briefly on some engines)
Widening a column type (e.g., VARCHAR(50) to VARCHAR(100))

Breaking Changes (Require Migration Strategy)

Renaming or removing a column
Changing a column type in incompatible ways
Adding a NOT NULL constraint to an existing column with null data
Splitting or merging tables

The Expand-Contract Pattern

For breaking changes in production with zero downtime:

Expand -- add the new structure alongside the old one
Migrate -- backfill data from old to new, dual-write during transition
Switch -- update application code to use the new structure
Contract -- remove the old structure once nothing references it

See Migration Patterns Reference for zero-downtime strategies, rollback techniques, and multi-tool examples.

Partitioning and Sharding

Table Partitioning (Single Database)

Strategy	How It Works	Use Case
Range	Rows split by value ranges (e.g., by month)	Time-series data, log tables, archival
List	Rows split by discrete values (e.g., by region)	Multi-tenant data, geographic segmentation
Hash	Rows distributed by hash of a column	Even distribution when no natural range exists

Sharding (Multiple Databases)

Sharding distributes data across separate database instances. Use it only after single-instance optimizations (indexing, caching, read replicas) are exhausted.

Shard key selection criteria:

High cardinality -- many distinct values to distribute evenly
Present in most queries -- avoids scatter-gather across all shards
Stable -- values that do not change after creation
Avoid hotspots -- do not shard by a value that concentrates writes (e.g., current date)

Quick Reference: Common Design Mistakes

Mistake	Consequence	Fix
No foreign key constraints	Orphaned rows, inconsistent data	Always define foreign keys unless there is a documented reason not to
Over-indexing	Slow writes, wasted storage	Index only columns used in WHERE, JOIN, ORDER BY of actual queries
Storing computed values without a refresh strategy	Stale data, silent bugs	Define update triggers, events, or batch jobs alongside any denormalization
Using ENUM types for values that change	Schema migration for every new value	Use a lookup table with a foreign key instead
Storing money as floating-point	Rounding errors	Use DECIMAL/NUMERIC or store as integer cents
Missing created_at / updated_at timestamps	No auditability, difficult debugging	Add timestamp columns to every table by default
Generic `type` + `type_id` polymorphism everywhere	No referential integrity, complex queries	Evaluate STI, CTI, or separate tables first

Reference Files

Reference	Contents
Modeling Patterns	Polymorphic associations (STI/CTI/TPT), soft deletes, audit trails, temporal data, self-referential trees, JSON columns
Indexing Strategies	B-tree/hash/GiST/GIN details, composite index design, covering and partial indexes, EXPLAIN analysis, anti-patterns
Migration Patterns	Version-based vs state-based migrations, expand-contract, data migrations, rollback strategies, multi-tool examples

Integration with Other Skills

Situation	Recommended Skill
Optimizing query performance and caching	Install `knowledge-virtuoso` from `krzysztofsurdy/code-virtuoso` for performance optimization patterns
Designing domain models and aggregates	Install `knowledge-virtuoso` from `krzysztofsurdy/code-virtuoso` for clean architecture and DDD guidance
Building APIs that expose database-backed resources	Install `knowledge-virtuoso` from `krzysztofsurdy/code-virtuoso` for API design principles
Testing database interactions	Install `knowledge-virtuoso` from `krzysztofsurdy/code-virtuoso` for testing strategies