Data Model Designer
Designs data schemas that balance normalization for integrity with denormalization for performance, producing entity-relationship models, migration strategies, and data evolution guidelines.
Guiding Principle
"Normalize until it hurts, denormalize until it works."
Procedure
Step 1 — Domain Entity Discovery
- Extract entities from the domain model, user stories, or existing codebase.
- Identify attributes for each entity, marking required vs. optional and data types.
- Map relationships: one-to-one, one-to-many, many-to-many with cardinality notation.
- Identify aggregate roots (DDD) — these become primary tables or document collections.
- Produce an initial ER diagram.
Step 2 — Normalize the Model
- Apply First Normal Form: eliminate repeating groups, ensure atomic values.
- Apply Second Normal Form: remove partial dependencies on composite keys.
- Apply Third Normal Form: remove transitive dependencies.
- Evaluate Boyce-Codd Normal Form for critical tables handling financial or audit data.
- Document each normalization decision with the trade-off rationale.
Step 3 — Strategic Denormalization
- Identify read-heavy access patterns that would require expensive joins.
- Apply calculated/materialized columns for frequently computed values.
- Consider read replicas, materialized views, or CQRS read models for reporting queries.
- Document every denormalization with: reason, staleness tolerance, refresh strategy.
- Evaluate document embedding vs. referencing for NoSQL stores.
Step 4 — Evolution & Governance
- Define a migration strategy: forward-only migrations with version tracking (Flyway, Alembic).
- Establish naming conventions: snake_case for SQL, camelCase for document stores.
- Specify indexing strategy for the top access patterns.
- Document data retention policies and archival strategy.
- Produce the final schema DDL or document schema with validation rules.
Quality Criteria
- Every entity has a clearly defined primary key strategy (surrogate vs. natural).
- All denormalizations are documented with staleness tolerance and refresh mechanism.
- Indexing strategy covers the top 5 query patterns by frequency.
- Schema supports the identified access patterns without full table scans on core queries.
Anti-Patterns
- Premature denormalization before understanding actual access patterns ("optimize by guessing").
- Entity-Attribute-Value (EAV) tables as a substitute for proper schema design.
- Using JSON columns as a catch-all for unmodeled data, bypassing relational constraints.
- Missing foreign key constraints in relational databases ("we'll enforce it in the app layer").