From claude-starter-kit
Designs scalable software architectures using patterns like MVC, microservices, event-driven; applies SOLID principles, evaluates trade-offs, creates ADRs. For new projects, monolith refactoring, scaling, multi-team work.
npx claudepluginhub sunnypatneedi/claude-starter-kitThis skill uses the workspace's default tool permissions.
Complete framework for designing software systems that are scalable, maintainable, and aligned with business requirements.
Designs scalable system architectures, reviews existing designs, creates ADRs, evaluates design patterns like microservices, and plans scalability with trade-offs.
Designs software architectures evaluating monolith/microservices/serverless/event-driven/CQRS/hexagonal patterns; generates C4 diagrams, ADRs, bounded contexts, and quality analysis.
Designs high-level system architectures, creates diagrams and ADRs, reviews existing designs, evaluates technology trade-offs for scalability and microservices.
Share bugs, ideas, or general feedback.
Complete framework for designing software systems that are scalable, maintainable, and aligned with business requirements.
Architecture Serves Business:
SOLID Principles:
S - Single Responsibility Principle
O - Open/Closed Principle
L - Liskov Substitution Principle
I - Interface Segregation Principle
D - Dependency Inversion Principle
Other Key Principles:
Functional Requirements:
## What the System Must Do
**User Stories:**
- As a [user], I want to [action] so that [benefit]
**Features:**
- User authentication
- Product catalog
- Shopping cart
- Payment processing
- Order tracking
**Business Rules:**
- Discount codes can only be used once per user
- Orders over $50 get free shipping
- Inventory decrements on successful payment
Non-Functional Requirements (The "ilities"):
## How the System Must Perform
**Scalability:**
- Support 10K concurrent users
- Handle 100K products in catalog
- Process 1K orders per hour
**Performance:**
- Page load <2 seconds
- API response <100ms (p95)
- Search results <500ms
**Reliability:**
- 99.9% uptime (8.7 hours downtime/year)
- Zero data loss
- Graceful degradation under load
**Security:**
- PCI DSS compliant for payments
- GDPR compliant for EU users
- Data encrypted at rest and in transit
**Maintainability:**
- New developers productive in 1 week
- Deploy multiple times per day
- Rollback within 5 minutes
**Observability:**
- Full request tracing
- Error rate monitoring
- Performance metrics
Monolith:
Best for:
- Small teams (<10 people)
- Simple domains
- Early-stage startups
- Rapid iteration
Architecture:
┌─────────────────────────┐
│ Web Application │
│ ┌──────┬──────┬──────┐ │
│ │ UI │Logic │ Data │ │
│ └──────┴──────┴──────┘ │
└─────────────────────────┘
↓
Single Database
Pros:
✅ Simple to develop
✅ Simple to deploy
✅ Simple to test
✅ Low latency between components
Cons:
❌ Scaling requires scaling everything
❌ Tight coupling
❌ One failure affects all
❌ Hard to work on independently
Microservices:
Best for:
- Large teams (multiple squads)
- Complex domains
- Independent scaling needs
- Polyglot requirements
Architecture:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ User │ │ Order │ │ Payment │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
↓ ↓ ↓
User DB Order DB Payment DB
Pros:
✅ Independent deployment
✅ Technology flexibility
✅ Team autonomy
✅ Fault isolation
Cons:
❌ Network complexity
❌ Distributed transactions hard
❌ More operational overhead
❌ Debugging across services
Event-Driven:
Best for:
- Async workflows
- Real-time data processing
- Audit trails
- Decoupled systems
Architecture:
┌─────────┐ ┌────────────┐
│Producer │──────>│Event Queue │
└─────────┘ └─────┬──────┘
│
┌──────────────┼──────────────┐
↓ ↓ ↓
Consumer 1 Consumer 2 Consumer 3
Pros:
✅ Loose coupling
✅ Easy to add consumers
✅ Natural audit log
✅ Handles spikes well
Cons:
❌ Eventual consistency
❌ Harder to debug
❌ Message ordering challenges
❌ More moving parts
Layered Architecture (N-Tier):
Best for:
- Traditional enterprise apps
- Clear separation of concerns
- Team specialization (frontend/backend/data)
Architecture:
┌─────────────────────────┐
│ Presentation Layer │ (UI, API)
├─────────────────────────┤
│ Business Logic Layer │ (Domain, Services)
├─────────────────────────┤
│ Data Access Layer │ (Repositories, ORM)
├─────────────────────────┤
│ Database Layer │ (PostgreSQL, etc.)
└─────────────────────────┘
Rules:
- Upper layers can call lower layers
- Lower layers cannot call upper layers
- Each layer has clear responsibility
Pros:
✅ Clear separation
✅ Testable layers
✅ Familiar pattern
Cons:
❌ Can become rigid
❌ Changes ripple across layers
❌ Performance overhead
Hexagonal Architecture (Ports & Adapters):
Best for:
- Domain-driven design
- Testing-heavy environments
- Swappable infrastructure
Architecture:
┌─────────────┐
│ Domain │
│ (Core) │
└──────┬──────┘
│
┌─────────┼─────────┐
↓ ↓ ↓
HTTP API Database Queue
(Adapter) (Adapter) (Adapter)
Core never depends on adapters
Adapters depend on core
Pros:
✅ Highly testable
✅ Infrastructure-agnostic
✅ DDD-friendly
Cons:
❌ More abstraction
❌ Steeper learning curve
❌ Can be over-engineered
Component Design Template:
## [Component Name]
**Purpose:**
What does this component do?
**Responsibilities:**
- Responsibility 1
- Responsibility 2
**Dependencies:**
- Component A (for X)
- Component B (for Y)
**Interfaces:**
```typescript
interface ComponentAPI {
operation1(input: Type): Promise<Result>;
operation2(input: Type): Result;
}
Data: What data does it own/manage?
Events: What events does it emit/consume?
Error Handling: How does it handle failures?
**Example - Order Service:**
```markdown
## Order Service
**Purpose:**
Manage order lifecycle from creation to fulfillment
**Responsibilities:**
- Create orders
- Update order status
- Calculate totals with discounts
- Validate inventory availability
**Dependencies:**
- User Service (get user details)
- Inventory Service (check/reserve stock)
- Payment Service (process payment)
**Interfaces:**
```typescript
interface OrderService {
createOrder(cart: Cart, userId: string): Promise<Order>;
getOrder(orderId: string): Promise<Order>;
updateStatus(orderId: string, status: OrderStatus): Promise<void>;
}
Events Emitted:
Events Consumed:
Error Handling:
### Step 4: Make Technology Choices
**Decision Framework:**
```markdown
## Technology Decision: [Name]
**Problem:**
What are we trying to solve?
**Options:**
1. Option A
2. Option B
3. Option C
**Criteria:**
- Performance requirements
- Team expertise
- Community support
- Cost
- Scalability
- Security
**Evaluation:**
| Criteria | Option A | Option B | Option C |
|----------|----------|----------|----------|
| Performance | 8/10 | 9/10 | 7/10 |
| Expertise | 9/10 | 5/10 | 8/10 |
| Community | 10/10 | 7/10 | 9/10 |
| Cost | Free | $X/mo | Free |
| Scalability | 7/10 | 10/10 | 8/10 |
**Decision:** Option A
**Rationale:**
Why we chose this option.
**Trade-offs:**
What we're giving up.
**Review Date:**
When we'll reconsider this decision.
Example - Database Choice:
## Database for Order Service
**Problem:**
Need persistent storage for orders with ACID guarantees
**Options:**
1. PostgreSQL (Relational)
2. MongoDB (Document)
3. DynamoDB (NoSQL)
**Criteria:**
- ACID compliance (critical)
- Complex queries (important)
- Scalability (important)
- Team expertise (important)
**Evaluation:**
| Criteria | PostgreSQL | MongoDB | DynamoDB |
|----------|------------|---------|----------|
| ACID | ✅ Full | ⚠️ Limited | ⚠️ Eventual |
| Queries | ✅ Excellent | ⚠️ Good | ❌ Limited |
| Scale | ✅ Vertical+ | ✅ Horizontal | ✅ Managed |
| Expertise | ✅ High | ⚠️ Medium | ❌ Low |
**Decision:** PostgreSQL
**Rationale:**
- ACID compliance is non-negotiable for financial transactions
- Team has 5 years PostgreSQL experience
- Can scale vertically to meet current needs
- Complex reporting queries needed
**Trade-offs:**
- Harder to horizontally scale than MongoDB
- More expensive at large scale than DynamoDB
- Self-managed vs fully managed
**Review Date:** When we hit 100K orders/day
Scaling Strategies:
## Vertical Scaling (Scale Up)
Add more resources to single machine
**When:**
- Quick fix needed
- Simple deployment
- Under 10K users
**How:**
- Bigger CPU
- More RAM
- Faster disk
**Limits:**
- Hardware ceiling
- Single point of failure
- Expensive at scale
---
## Horizontal Scaling (Scale Out)
Add more machines
**When:**
- Growth expected
- High availability needed
- Cost-effective at scale
**How:**
- Load balancer
- Stateless services
- Shared database or sharding
**Challenges:**
- Session management
- Distributed state
- Data consistency
---
## Caching Strategy
Reduce load on database/services
**Layers:**
Browser Cache → CDN → App Cache → Database Cache
Patterns:
Example:
async function getUser(id: string): Promise<User> {
// 1. Check cache
const cached = await cache.get(`user:${id}`);
if (cached) return cached;
// 2. Cache miss: fetch from DB
const user = await db.users.findById(id);
// 3. Store in cache (TTL: 1 hour)
await cache.set(`user:${id}`, user, 3600);
return user;
}
Read Replicas:
┌────────┐
│Primary │ (writes)
└───┬────┘
│
├──────────┬──────────┐
↓ ↓ ↓
Replica Replica Replica
(reads) (reads) (reads)
Sharding:
User IDs 0-999 → Shard 1
User IDs 1000-1999 → Shard 2
User IDs 2000-2999 → Shard 3
Challenges:
- Rebalancing
- Cross-shard queries
- Transactions across shards
Partitioning:
Orders by date:
├── 2024-Q1 → Partition 1
├── 2024-Q2 → Partition 2
├── 2024-Q3 → Partition 3
└── 2024-Q4 → Partition 4
Benefits:
- Query performance
- Easier archival
- Smaller indexes
Architecture Decision Record Template:
# ADR [Number]: [Title]
**Status:** [Proposed | Accepted | Deprecated | Superseded]
**Date:** YYYY-MM-DD
**Deciders:** [Names]
---
## Context
What is the issue we're trying to solve?
**Current Situation:**
[Describe current state]
**Problem:**
[What needs to change and why]
**Constraints:**
- Technical constraints
- Business constraints
- Time constraints
---
## Decision
We will [decision].
**Details:**
[Explain the decision in detail]
---
## Options Considered
### Option 1: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
### Option 2: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
---
## Consequences
**Positive:**
- What improves
- What becomes easier
**Negative:**
- What becomes harder
- What we give up
**Risks:**
- What could go wrong
- Mitigation strategies
**Technical Debt:**
- What shortcuts are we taking
- When will we revisit
---
## Follow-up Actions
- [ ] Action 1 (Owner, Due Date)
- [ ] Action 2 (Owner, Due Date)
---
## References
- Link to design doc
- Link to RFC
- Related ADRs
Example ADR:
# ADR 001: Migrate from Monolith to Microservices
**Status:** Accepted
**Date:** 2026-01-15
**Deciders:** Architecture Team, Engineering Leads
---
## Context
**Current Situation:**
Single Rails monolith serving all traffic. 50K daily active users.
**Problem:**
- Deployment takes 30 minutes, blocks all teams
- Database at 80% capacity
- Cannot scale teams independently
- Different services have different scaling needs (API vs background jobs)
**Constraints:**
- Must maintain 99.9% uptime during migration
- Complete within 6 months
- Team of 15 engineers
---
## Decision
We will migrate to microservices using the Strangler Fig pattern.
**Approach:**
1. Start with highest-value, lowest-risk services (User Service, Notifications)
2. Extract one service per month
3. API Gateway routes to new services
4. Monolith remains for remaining functionality
5. Gradual data migration
**Tech Stack:**
- Services: Node.js/TypeScript
- Communication: REST + Message Queue (RabbitMQ)
- Deployment: Kubernetes
- Data: PostgreSQL per service
---
## Options Considered
### Option 1: Continue Scaling Monolith
**Pros:**
- Simplest
- Team already knows it
- No migration risk
**Cons:**
- Doesn't solve team scaling
- Database still bottleneck
- Deployment still blocking
### Option 2: Big Bang Rewrite
**Pros:**
- Fresh start
- Modern architecture
**Cons:**
- High risk
- 6+ months no features
- Likely to fail
### Option 3: Strangler Fig Migration (CHOSEN)
**Pros:**
- Low risk (gradual)
- Continuous value delivery
- Reversible
- Learn as we go
**Cons:**
- Longer timeline
- Temporary complexity
- Some duplication
---
## Consequences
**Positive:**
- Teams can deploy independently
- Services scale independently
- Technology flexibility
- Fault isolation
**Negative:**
- Operational complexity (15+ services)
- Distributed debugging harder
- Network latency between services
- More infrastructure cost
**Risks:**
- Data consistency across services
- Authentication/authorization complexity
- Monitoring/observability gaps
**Mitigation:**
- Event sourcing for data sync
- Shared auth service
- OpenTelemetry from day 1
**Technical Debt:**
- Monolith will coexist for 12-18 months
- Some duplication during migration
- Revisit architecture Q3 2026
---
## Follow-up Actions
- [x] Create migration roadmap (Sarah, 2026-01-20)
- [x] Set up Kubernetes cluster (DevOps, 2026-01-25)
- [ ] Extract User Service (Team A, 2026-02-15)
- [ ] Implement API Gateway (Team B, 2026-02-01)
- [ ] Set up observability (DevOps, 2026-01-30)
---
## References
- [Migration Roadmap](link)
- [Microservices RFC](link)
- Related: ADR 002 (Service Communication Pattern)
API Gateway Pattern:
Client
↓
API Gateway (routes, auth, rate limiting)
├──→ User Service
├──→ Order Service
└──→ Payment Service
Benefits:
- Single entry point
- Handles cross-cutting concerns
- Backend for frontend
Circuit Breaker Pattern:
class CircuitBreaker {
state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
failures = 0;
threshold = 5;
async call(fn: Function) {
if (this.state === 'OPEN') {
throw new Error('Circuit breaker OPEN');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
setTimeout(() => this.state = 'HALF_OPEN', 60000);
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
}
Saga Pattern (Distributed Transactions):
Order Saga:
1. Create Order → Success
2. Reserve Inventory → Success
3. Charge Payment → FAILS
Compensation (rollback):
3. Refund Payment ← (skipped, never charged)
2. Release Inventory ← Execute
1. Cancel Order ← Execute
Result: Consistent state, no partial orders
CQRS (Command Query Responsibility Segregation):
Commands (Writes): Queries (Reads):
Create Order Get Order
Update User List Orders
Delete Product Search Products
↓ ↑
Write DB ──────→ Read DB
(normalized) (denormalized)
Benefits:
- Optimize read/write separately
- Scale independently
- Complex queries without impacting writes
## Pre-Development
- [ ] Functional requirements documented
- [ ] Non-functional requirements defined
- [ ] Architecture pattern chosen
- [ ] Technology stack decided
- [ ] Data model designed
- [ ] API contracts defined
- [ ] Security reviewed
- [ ] Scalability plan created
## During Development
- [ ] Code organized by domain/feature
- [ ] Dependencies point inward (clean architecture)
- [ ] Interfaces define contracts
- [ ] Error handling consistent
- [ ] Logging and monitoring instrumented
- [ ] Tests cover critical paths
- [ ] Documentation up to date
## Pre-Production
- [ ] Load testing completed
- [ ] Security audit passed
- [ ] Monitoring dashboards ready
- [ ] Alerts configured
- [ ] Runbooks written
- [ ] Rollback plan tested
- [ ] DR plan documented
- [ ] Team trained
| Don't | Do |
|---|---|
| Microservices for everything | Start monolith, extract when needed |
| Premature optimization | Optimize when you have data |
| Architecture astronaut | Solve today's problems, not future maybes |
| Copy Big Tech architecture | Your scale != their scale |
| Ignore non-functional requirements | Performance/security/reliability matter |
| Big Bang rewrites | Incremental refactoring |
| One size fits all | Different components, different patterns |
| Skip documentation | ADRs, diagrams, runbooks |
Diagramming:
Books:
Patterns:
/systems-decompose - Break down features/database-schema - Design data models/api-design - Design API contracts/code-review - Review architectural decisionsLast Updated: 2026-01-22