Use when designing data models, database schemas, or choosing between modeling approaches. Covers dimensional modeling, star schema, data vault, entity-relationship design, and schema evolution.
Provides data modeling guidance covering OLTP vs OLAP, normalization, dimensional modeling (star/snowflake), and Data Vault. Use when designing database schemas, choosing modeling approaches, or planning schema evolution.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install systems-design@melodic-softwareThis skill is limited to using the following tools:
Comprehensive guide to data modeling techniques for operational databases, data warehouses, and analytical systems.
Data Model Categories:
1. Conceptual Model
Purpose: Business understanding
Audience: Business stakeholders
Content: Entities, relationships, business rules
Detail: High-level, no implementation details
2. Logical Model
Purpose: Structure definition
Audience: Data architects
Content: Tables, columns, keys, relationships
Detail: Database-agnostic design
3. Physical Model
Purpose: Implementation
Audience: Database engineers
Content: Indexes, partitions, storage
Detail: Database-specific optimization
Model Evolution:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Business Conceptual Logical Physical │
│ Requirements ──► Model ──► Model ──► Model │
│ │
│ "Customers Customer ─────► customers customers │
│ make orders" Order ├── id ├── id PK │
│ └──< makes ├── name ├── name │
│ └── email ├── email │
│ └── idx_* │
│ │
└─────────────────────────────────────────────────────────────┘
Operational (OLTP) vs Analytical (OLAP):
┌────────────────────────────────────────────────────────────┐
│ OLTP │
│ (Online Transaction Processing) │
├────────────────────────────────────────────────────────────┤
│ Purpose: Run the business │
│ Workload: Many small transactions │
│ Model: Normalized (3NF) │
│ Optimize: Write performance, consistency │
│ Users: Applications, services │
│ Example: INSERT new order, UPDATE inventory │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ OLAP │
│ (Online Analytical Processing) │
├────────────────────────────────────────────────────────────┤
│ Purpose: Analyze the business │
│ Workload: Few complex queries │
│ Model: Denormalized (star/snowflake) │
│ Optimize: Read performance, aggregation │
│ Users: Analysts, BI tools │
│ Example: SUM(sales) GROUP BY region, month │
└────────────────────────────────────────────────────────────┘
When to Use Each:
├── OLTP: User-facing applications, real-time operations
├── OLAP: Reporting, dashboards, machine learning
└── Hybrid: Some systems need both (HTAP)
Database Normalization:
1NF (First Normal Form):
├── Eliminate repeating groups
├── Create separate table for related data
├── Identify each row with primary key
└── Each cell contains single value
Example (SQL Server - PascalCase):
❌ Orders (Id, Item1, Item2, Item3)
✓ Orders (Id, ...) + OrderItems (OrderId, ItemId)
2NF (Second Normal Form):
├── Meet 1NF requirements
├── Remove partial dependencies
└── Non-key columns depend on entire primary key
Example (SQL Server - PascalCase):
❌ OrderItems (OrderId, ProductId, ProductName)
✓ OrderItems (OrderId, ProductId) + Products (ProductId, ProductName)
3NF (Third Normal Form):
├── Meet 2NF requirements
├── Remove transitive dependencies
└── Non-key columns depend only on primary key
Example (SQL Server - PascalCase):
❌ Orders (Id, CustomerId, CustomerCity)
✓ Orders (Id, CustomerId) + Customers (Id, City)
BCNF (Boyce-Codd Normal Form):
├── Meet 3NF requirements
├── Every determinant is a candidate key
└── Handles multi-valued dependencies
Higher Normal Forms (4NF, 5NF):
├── Handle complex multi-valued dependencies
├── Rarely used in practice
└── Can lead to over-normalization
Normalization Trade-offs:
Normalized (3NF):
├── Pros: Data integrity, no redundancy, smaller storage
├── Cons: Complex joins, slower reads
└── Use: OLTP, write-heavy workloads
Denormalized:
├── Pros: Faster reads, simpler queries
├── Cons: Redundancy, update anomalies, larger storage
└── Use: OLAP, read-heavy workloads
Denormalization Triggers:
1. Read performance is critical
2. Write frequency is low
3. Data changes infrequently
4. Queries frequently join same tables
5. Aggregations are common
Controlled Denormalization Patterns:
┌─────────────────────────────────────────────────────────────┐
│ Pattern: Summary Tables │
│ Keep normalized + materialized aggregates │
│ │
│ orders ──┐ │
│ items ──┼──► daily_sales_summary (materialized) │
│ products─┘ │
│ │
│ Best of both: Write to normalized, read from summary │
└─────────────────────────────────────────────────────────────┘
Star Schema:
Central fact table surrounded by dimension tables.
┌──────────────┐
│ dim_date │
│──────────────│
│ date_key PK │
│ date │
│ month │
│ quarter │
│ year │
└──────┬───────┘
│
┌──────────────┐ ┌──────┴───────┐ ┌──────────────┐
│ dim_product │ │ fact_sales │ │ dim_customer │
│──────────────│ │──────────────│ │──────────────│
│ product_key PK│◄──│ date_key FK │───►│ customer_key PK│
│ product_name │ │ product_key FK│ │ customer_name│
│ category │ │ customer_key FK│ │ segment │
│ brand │ │ store_key FK │ │ region │
└──────────────┘ │──────────────│ └──────────────┘
│ quantity │
│ revenue │ │
│ cost │ │
└──────┬───────┘ ┌──────┴───────┐
│ │ dim_store │
└───────────►│──────────────│
│ store_key PK│
│ store_name │
│ city │
│ state │
└──────────────┘
Star Schema Benefits:
├── Simple queries (few joins)
├── Optimized for aggregation
├── Intuitive for business users
├── Works well with BI tools
└── Predictable query performance
Fact Table Types:
├── Transaction: One row per event (sale, click)
├── Periodic Snapshot: One row per period (daily balance)
├── Accumulating Snapshot: Track progress (order lifecycle)
└── Factless: Events without measures (attendance)
Snowflake Schema:
Normalized dimensions (dimensions have sub-dimensions).
┌──────────────┐
│ dim_date │
└──────┬───────┘
│
┌──────────────┐ ┌──────┴───────┐ ┌──────────────┐
│ dim_category │◄───│ dim_product │ │ dim_customer │
└──────────────┘ └──────┬───────┘ └──────┬───────┘
│ │
┌──────────────────┼────────────────────┼──────┐
│ ┌──────┴───────┐ │ │
│ │ fact_sales │ │ │
│ └──────┬───────┘ │ │
│ │ │ │
│ ┌──────┴───────┐ ┌──────┴──────┐
│ │ dim_store │ │ dim_segment │
│ └──────┬───────┘ └─────────────┘
│ │
│ ┌──────┴───────┐
│ │ dim_city │
│ └──────────────┘
Star vs Snowflake:
┌────────────────────────────────────────────────────────────┐
│ Factor │ Star │ Snowflake │
├─────────────────┼───────────────────┼──────────────────────┤
│ Query simplicity│ Simpler │ More complex │
│ Query speed │ Faster (fewer joins)│ Slower │
│ Storage │ More (redundancy) │ Less (normalized) │
│ Maintenance │ Easier │ More complex │
│ BI tool support │ Better │ May need modeling │
└────────────────────────────────────────────────────────────┘
Recommendation: Prefer star schema unless storage is critical concern.
Slowly Changing Dimensions (SCD):
How to handle dimension changes over time.
Type 0: Retain Original
└── Never update dimension
└── Use for: Attributes that shouldn't change
Type 1: Overwrite
└── Update in place, lose history
└── Use for: Corrections, non-historical attributes
Type 2: Add New Row (Most Common)
┌─────────────────────────────────────────────────────────────┐
│ customer_key│ customer_id │ address │ valid_from │ valid_to │ current│
├─────────────┼─────────────┼──────────────┼────────────┼──────────┼────────┤
│ 1001 │ C123 │ 123 Oak St │ 2020-01-01 │ 2023-06-30│ false │
│ 1002 │ C123 │ 456 Pine Ave │ 2023-07-01 │ 9999-12-31│ true │
└─────────────────────────────────────────────────────────────┘
└── New surrogate key, track validity period
└── Use for: Full history required
Type 3: Add New Column
└── Previous and current value columns
└── Use for: Limited history (just previous)
Type 4: History Table
└── Current in main table, history in separate table
└── Use for: Frequent changes, large dimensions
Type 6: Hybrid (1+2+3)
└── Combines approaches for flexibility
└── Use for: Complex requirements
Data Vault Modeling:
Enterprise data warehouse pattern for agility and auditability.
Components:
┌─────────────────────────────────────────────────────────────┐
│ DATA VAULT │
├─────────────────────────────────────────────────────────────┤
│ │
│ HUBS (Business Keys) │
│ ┌────────────────────┐ │
│ │ hub_customer │ │
│ │ ├── hub_key PK │ ← Surrogate key │
│ │ ├── customer_id │ ← Business key │
│ │ ├── load_date │ ← When loaded │
│ │ └── source │ ← Where from │
│ └────────────────────┘ │
│ │
│ LINKS (Relationships) │
│ ┌────────────────────┐ │
│ │ link_customer_order│ │
│ │ ├── link_key PK │ │
│ │ ├── hub_customer_fk│ │
│ │ ├── hub_order_fk │ │
│ │ ├── load_date │ │
│ │ └── source │ │
│ └────────────────────┘ │
│ │
│ SATELLITES (Descriptive Data) │
│ ┌────────────────────┐ │
│ │ sat_customer_details│ │
│ │ ├── hub_customer_fk│ │
│ │ ├── load_date PK │ ← Part of PK │
│ │ ├── name │ ← Descriptive attributes │
│ │ ├── email │ │
│ │ ├── hash_diff │ ← Change detection │
│ │ └── source │ │
│ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Data Vault Benefits:
├── Full audit trail (all changes tracked)
├── Source system agnostic
├── Parallel loading possible
├── Handles schema changes gracefully
└── Good for regulatory requirements
Data Vault Challenges:
├── Complex queries (many joins)
├── Requires presentation layer
├── Steep learning curve
├── More storage than star schema
└── Not suitable for direct BI access
Data Vault Decision Matrix:
Use Data Vault When:
├── Multiple source systems with different schemas
├── Regulatory requirements for audit trails
├── Schema changes are frequent
├── Historical accuracy is critical
├── Large enterprise with complex data landscape
└── Data integration from acquisitions
Use Star Schema When:
├── Simpler reporting requirements
├── Single source system
├── Fast query performance priority
├── Business users query directly
├── Smaller scale, simpler needs
└── Quick time to value needed
Hybrid Approach:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Source ──► Data Vault ──► Star Schema ──► BI Tools │
│ Systems (Raw Vault) (Presentation) (Reports) │
│ │
│ Benefits: │
│ ├── Raw vault: Full history, audit │
│ ├── Star schema: Query performance │
│ └── Decoupled: Change raw without breaking reports │
│ │
└─────────────────────────────────────────────────────────────┘
Schema Evolution Strategies:
1. Additive Changes (Safest)
├── Add new columns with defaults
├── Add new tables
└── Add new indexes
Safe: Does not break existing queries
ALTER TABLE users ADD COLUMN phone VARCHAR(20) DEFAULT NULL;
2. Non-Breaking Changes
├── Widen column types (VARCHAR(50) → VARCHAR(100))
├── Remove NOT NULL constraint
└── Add optional foreign keys
Usually safe, verify with testing
3. Breaking Changes (Dangerous)
├── Rename columns/tables
├── Remove columns
├── Change column types
└── Add NOT NULL without default
Requires migration strategy
Migration Patterns:
┌─────────────────────────────────────────────────────────────┐
│ Pattern: Expand-Contract │
│ │
│ Phase 1 (Expand): │
│ ├── Add new column (name_new) │
│ ├── Write to both old and new │
│ └── Backfill new column │
│ │
│ Phase 2 (Migrate): │
│ ├── Update readers to use new column │
│ └── Verify all systems migrated │
│ │
│ Phase 3 (Contract): │
│ └── Remove old column │
│ │
│ Timeline: Days to weeks per phase │
└─────────────────────────────────────────────────────────────┘
Schema Versioning:
1. Single Schema (Most Common)
├── One schema, evolve in place
├── Use migrations (Flyway, Liquibase)
└── All applications use same version
2. Multi-Version Schema
├── Multiple schema versions exist
├── Applications specify version
└── Complex but allows gradual migration
Example: v1_users, v2_users
3. Event Sourcing
├── Store events, not state
├── Replay to any schema version
└── Most flexible, most complex
Events: UserCreated, UserNameChanged, UserDeleted
→ Replay to construct current state
Migration Tools:
┌────────────────────────────────────────────────────────────┐
│ Tool │ Language │ Approach │
├─────────────┼────────────┼─────────────────────────────────┤
│ Flyway │ Java/SQL │ SQL migrations, versioned │
│ Liquibase │ Java/XML │ Changelog format, rollback │
│ Alembic │ Python │ SQLAlchemy integration │
│ Entity Framework│ .NET │ Code-first migrations │
│ Prisma │ TypeScript │ Declarative schema │
│ Atlas │ Go │ Declarative + imperative │
└────────────────────────────────────────────────────────────┘
Primary Key Options:
1. Natural Keys
├── Business identifier (email, SSN, ISBN)
├── Pros: Meaningful, unique in business
├── Cons: Can change, privacy concerns
└── Use: When truly immutable
2. Surrogate Keys (Recommended)
├── Auto-increment integer
├── Pros: Simple, performant, stable
├── Cons: Meaningless, DB-specific
└── Use: Most operational systems
3. UUIDs
├── Universally unique identifier
├── Pros: Globally unique, distributed generation
├── Cons: Larger, less performant
└── Use: Distributed systems, external exposure
4. ULIDs / Snowflake IDs
├── Time-sortable unique identifiers
├── Pros: Sortable, unique, distributed
├── Cons: More complex generation
└── Use: Time-series, event systems
Comparison:
┌────────────────────────────────────────────────────────────┐
│ Type │ Size │ Sortable │ Distributed │ Example │
├───────────────┼─────────┼──────────┼─────────────┼─────────┤
│ Auto-increment│ 4-8 bytes│ Yes │ No │ 12345 │
│ UUID v4 │ 16 bytes│ No │ Yes │ a1b2c...│
│ UUID v7 │ 16 bytes│ Yes │ Yes │ 0188... │
│ ULID │ 16 bytes│ Yes │ Yes │ 01H5... │
│ Snowflake │ 8 bytes │ Yes │ Yes │ 7890... │
└────────────────────────────────────────────────────────────┘
Composite Keys:
Primary keys with multiple columns.
Use Cases:
├── Junction tables (many-to-many)
├── Time-series with partitioning
├── Multi-tenant systems
└── Natural business keys
Example: Order Items (SQL Server - PascalCase)
┌─────────────────────────────────────────────────────────────┐
│ OrderItems │
│ ├── OrderId PK, FK │
│ ├── LineNumber PK (composite with OrderId) │
│ ├── ProductId FK │
│ ├── Quantity │
│ └── Price │
└─────────────────────────────────────────────────────────────┘
Trade-offs:
├── Pros: Enforces uniqueness, natural ordering
├── Cons: Complex foreign keys, ORM challenges
└── Alternative: Surrogate key + unique constraint
Data Modeling Best Practices:
1. Understand the Use Case First
├── OLTP vs OLAP requirements
├── Query patterns expected
├── Write vs read ratio
└── Growth expectations
2. Start Normalized, Denormalize with Reason
├── Begin with 3NF for operational
├── Denormalize only for proven performance needs
├── Document denormalization decisions
└── Consider materialized views first
3. Use Consistent Naming (Database-Specific)
├── Singular table names (User, not Users)
├── SQL Server: PascalCase (CustomerId, OrderDate, CreatedAt)
├── PostgreSQL: snake_case (customer_id, order_date, created_at)
├── Foreign keys: {Entity}Id or {entity}_id per convention
└── Avoid reserved words
4. Document Everything
├── Column descriptions
├── Relationship meanings
├── Business rules
└── Change history
5. Plan for Change
├── Use surrogate keys
├── Design for schema evolution
├── Version your schemas
└── Test migrations thoroughly
6. Consider Performance Early
├── Index strategy
├── Partitioning needs
├── Data types (smallest sufficient)
└── Query patterns
Data Modeling Anti-Patterns:
1. "One Table to Rule Them All"
❌ Single table with many nullable columns
✓ Proper entity separation
2. "Entity-Attribute-Value (EAV)"
❌ Generic key-value tables
✓ Proper columns or JSON fields
3. "CSV in a Column"
❌ Comma-separated values in one field
✓ Proper junction table
4. "Premature Denormalization"
❌ Denormalize without measuring
✓ Start normalized, optimize with data
5. "No Foreign Keys"
❌ Skipping FK for 'performance'
✓ Use FK for integrity, consider indexes
6. "Meaningless Names"
❌ table1, field_a, temp_data
✓ Descriptive, consistent naming
data-architecture - Data lake, lakehouse, data meshstream-processing - Real-time data modelingdatabase-scaling - Scaling data systemsetl-elt-patterns - Data transformationCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.