Schema design, normalization levels, indexing strategy matched to access patterns. Use when designing database schemas, choosing between embed vs reference, or optimizing slow queries.
From sde-system-designnpx claudepluginhub chavangorakh1999/sde-skills --plugin sde-system-designThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Design schemas that serve your access patterns efficiently. The right schema makes queries fast and simple. The wrong schema makes every query a workaround.
Domain to model: $ARGUMENTS
Schema follows access patterns, not the other way around. List every query before writing a single table definition.
Access Pattern Analysis Template:
Query Frequency Latency Target
----------------------------------------------------------------------
Get user by email (login) Very High < 10ms
Get all posts by user, newest first High < 50ms
Get post with author info High < 50ms
Get comments for a post, paginated Medium < 100ms
Search posts by keyword Medium < 200ms
Get user follower count High < 10ms
Get users a user follows (feed) High < 50ms
This list drives every schema decision. If a query isn't on the list, it's not shaping the schema.
Normalization levels in practice:
-- Well-normalized blog schema
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) NOT NULL UNIQUE,
display_name VARCHAR(50) NOT NULL,
password_hash CHAR(60) NOT NULL, -- bcrypt output is always 60 chars
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE posts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(255) NOT NULL,
body TEXT NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'draft'
CHECK (status IN ('draft', 'published', 'archived')),
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE comments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
body TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Tags: many-to-many requires junction table
CREATE TABLE tags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(50) NOT NULL UNIQUE
);
CREATE TABLE post_tags (
post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, tag_id)
);
Indexes are fast reads at the cost of slower writes and storage. Create indexes for columns that appear in WHERE, JOIN ON, and ORDER BY.
-- Primary key indexes are automatic
-- Index for login query (email lookup)
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Index for "posts by author, newest first"
CREATE INDEX idx_posts_author_created ON posts(author_id, created_at DESC);
-- Index for "published posts only" (partial index — smaller and faster)
CREATE INDEX idx_posts_published ON posts(published_at DESC)
WHERE status = 'published';
-- Index for comments by post (paginated)
CREATE INDEX idx_comments_post_created ON comments(post_id, created_at ASC);
-- Full-text search index
CREATE INDEX idx_posts_fts ON posts USING GIN(
to_tsvector('english', title || ' ' || body)
);
Index selection rules:
Composite index column order: put equality predicates first, range predicates last.
-- Query: WHERE author_id = $1 AND status = 'published' ORDER BY created_at DESC
-- Good: (author_id, status, created_at DESC) — filters by both equality first
-- Bad: (created_at, author_id) — sorts before filtering
Decide between embedding and referencing based on access patterns and cardinality.
Embed when:
Reference when:
// Example: Blog (Mongoose)
// EMBED: post with its tags (few, always accessed with post, not queried standalone)
const postSchema = new mongoose.Schema({
authorId: { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true, index: true },
title: { type: String, required: true, maxlength: 255 },
body: { type: String, required: true },
status: { type: String, enum: ['draft', 'published', 'archived'], default: 'draft' },
tags: [{ type: String, maxlength: 50 }], // embedded — tags are few and simple
publishedAt: { type: Date },
createdAt: { type: Date, default: Date.now },
updatedAt: { type: Date, default: Date.now }
});
// Compound index for "posts by author, newest first"
postSchema.index({ authorId: 1, createdAt: -1 });
// Partial index for published posts
postSchema.index({ publishedAt: -1 }, { partialFilterExpression: { status: 'published' } });
// Text search index
postSchema.index({ title: 'text', body: 'text' });
// REFERENCE: comments (potentially many, queried independently)
const commentSchema = new mongoose.Schema({
postId: { type: mongoose.Schema.Types.ObjectId, ref: 'Post', required: true, index: true },
authorId: { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true },
body: { type: String, required: true, maxlength: 2000 },
createdAt:{ type: Date, default: Date.now }
});
commentSchema.index({ postId: 1, createdAt: 1 }); // paginate comments for a post
Extended Reference Pattern (denormalize a subset of frequently-needed fields):
// Instead of joining to get author's displayName every time:
const postSchema = new mongoose.Schema({
author: {
_id: mongoose.Schema.Types.ObjectId, // for real-time refresh if needed
displayName: String, // denormalized — accept stale risk
avatarUrl: String
},
// ...rest of schema
});
// Trade-off: stale data if user updates displayName. Acceptable for non-critical display.
// Mitigation: background job to sync, or invalidate on update.
N+1 queries: Loading N posts then issuing N queries to get each author.
// Bad: N+1
const posts = await Post.find({ status: 'published' }).limit(20);
for (const post of posts) {
post.author = await User.findById(post.authorId); // N extra queries!
}
// Good: batch with populate (Mongoose) or JOIN (SQL)
const posts = await Post.find({ status: 'published' })
.limit(20)
.populate('authorId', 'displayName avatarUrl'); // single $in query
Missing updated_at: Always add updatedAt timestamps — invaluable for debugging, cache invalidation, and change data capture.
Storing passwords in plain text: Use bcrypt with work factor 12. Never store plaintext, never log passwords.
Unbounded arrays (MongoDB): An array that grows without limit will hit the 16MB BSON document limit. Use references instead of embedding when array size is unbounded.
Missing soft delete: Use deletedAt TIMESTAMPTZ instead of hard delete for auditable entities. Filter with WHERE deleted_at IS NULL.
## Data Model: [Domain]
### Access Patterns
| Query | Frequency | Latency Target |
### Schema
#### [Entity 1]
[Fields, types, constraints, indexes]
#### [Entity 2]
...
### Relationships
[ERD or relationship descriptions]
### Index Strategy
| Index | Columns | Type | Justification |
### Embed vs Reference Decisions (if document store)
[For each relationship: decision and rationale]
### Tradeoffs
[3-5 explicit tradeoffs: normalization vs performance, consistency vs flexibility, etc.]