Architecture Design Skill

Design system architecture and make strategic technical decisions.

Core Principle

Good architecture enables change while maintaining simplicity.

Architecture vs Planning

Architecture Design (this skill):

Strategic: "How should the system be structured?"
Component interactions and boundaries
Technology and pattern choices
Long-term implications
System-level decisions

Technical Planning (technical-planning skill):

Tactical: "How do I implement feature X?"
Specific implementation tasks
Execution details
Short-term focus

Use architecture when:

Designing new systems or subsystems
Major refactors affecting multiple components
Technology selection decisions
Defining system boundaries and interfaces
Making decisions with long-term impact

Use planning when:

Implementing within existing architecture
Breaking down specific features
Task sequencing and execution

Architecture Process

1. Understand Context

Business context:

What problem are we solving?
Who are the users?
What are the business goals?
What are the success metrics?

Technical context:

What exists today?
What constraints exist?
What must we integrate with?
What scale must we support?

Team context:

What's our expertise?
What can we maintain?
What's our velocity?

2. Gather Requirements

Functional requirements:

What must the system do?
What are the features?
What are the user scenarios?

Non-functional requirements:

Performance: Response time, throughput
Scalability: Expected load, growth
Availability: Uptime requirements
Security: Compliance, data protection
Maintainability: Team size, skills
Cost: Budget constraints

Example:

## Requirements

### Functional
- Users can search products by name/category
- Users can add items to cart
- Users can checkout and pay

### Non-Functional
- Search response time < 200ms (p95)
- Support 10,000 concurrent users
- 99.9% uptime
- PCI DSS compliant for payments
- Team of 5 developers can maintain

3. Identify Constraints

Technical constraints:

Must use existing authentication system
Must integrate with legacy inventory system
Database must be PostgreSQL (existing infrastructure)

Business constraints:

Must launch in 3 months
Budget of $50k for infrastructure
Must support EU data residency

Team constraints:

Team experienced in Python, less in Go
No DevOps specialist on team
Remote team across timezones

4. Consider Alternatives

Never design in a vacuum - consider options:

Example: Data storage choice

Option 1: PostgreSQL

Pros: Team knows it, ACID guarantees, rich query support
Cons: Vertical scaling limits, setup complexity

Option 2: MongoDB

Pros: Flexible schema, horizontal scaling
Cons: Team unfamiliar, eventual consistency

Option 3: DynamoDB

Pros: Fully managed, auto-scaling
Cons: Vendor lock-in, query limitations, cost at scale

Decision: PostgreSQL

Team expertise outweighs scaling concerns
Can re-evaluate if scale becomes issue
Faster initial development

5. Design System Structure

Define components and their responsibilities:

┌─────────────────────────────────────────────┐
│             Client Apps                      │
│  (Web, iOS, Android)                         │
└────────────────┬────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────┐
│          API Gateway / Load Balancer         │
└────────────────┬────────────────────────────┘
                 │
        ┌────────┴────────┐
        ▼                 ▼
┌───────────────┐  ┌───────────────┐
│   Auth        │  │   Core API     │
│   Service     │  │   Service      │
└───────┬───────┘  └───────┬───────┘
        │                  │
        │         ┌────────┴────────┐
        │         ▼                 ▼
        │  ┌──────────────┐  ┌──────────────┐
        │  │  PostgreSQL  │  │   Redis      │
        │  │  (Primary)   │  │   (Cache)    │
        │  └──────────────┘  └──────────────┘
        │
        ▼
┌───────────────┐
│   User DB     │
└───────────────┘

Component descriptions:

## Components

### API Gateway
**Responsibility:** Route requests, rate limiting, authentication
**Technology:** Nginx
**Dependencies:** Auth Service, Core API Service
**Scale:** 2-3 instances behind load balancer

### Auth Service
**Responsibility:** User authentication, session management, JWT issuing
**Technology:** Python (Flask), PostgreSQL
**API:** REST
**Scale:** Stateless, 2-N instances

### Core API Service
**Responsibility:** Business logic, data access, external integrations
**Technology:** Python (FastAPI), PostgreSQL, Redis
**API:** REST
**Scale:** Stateless, 2-N instances

### PostgreSQL
**Responsibility:** Primary data store
**Scale:** Primary with read replica

### Redis
**Responsibility:** Session storage, caching, rate limiting
**Scale:** Cluster mode (3 nodes)

6. Define Interfaces

API contracts:

## API Design

### POST /api/auth/login
**Purpose:** Authenticate user, issue JWT

**Request:**
```json
{
  "email": "user@example.com",
  "password": "secure_password"
}

Response (200):

{
  "token": "eyJ...",
  "user": {
    "id": "123",
    "email": "user@example.com",
    "name": "John Doe"
  }
}

Errors:

400: Invalid request
401: Invalid credentials
429: Rate limit exceeded


### 7. Plan for Failure

**What can go wrong?**
- Database unavailable
- External API down
- Network partition
- High load
- Data corruption

**Mitigation strategies:**
- Retry with exponential backoff
- Circuit breakers for external services
- Graceful degradation
- Health checks and monitoring
- Database backups

**Example:**
```markdown
## Failure Scenarios

### Database Unavailable
**Impact:** Cannot read/write data
**Mitigation:**
- Read replica failover (automated)
- Circuit breaker after 3 failures
- Cache serves stale data for 5 minutes
- User sees degraded experience message
**Recovery:** Manual failover to replica, fix primary

### External Payment API Down
**Impact:** Cannot process payments
**Mitigation:**
- Retry 3 times with exponential backoff
- Queue payments for later processing
- User notified of delay
- Alert on-call engineer
**Recovery:** Process queued payments once API recovers

8. Document Decisions

Architecture Decision Record (ADR):

# ADR-001: Use PostgreSQL for Primary Database

**Status:** Accepted
**Date:** 2024-01-15
**Deciders:** Tech Lead, Backend Team

## Context

We need to choose a primary database for user data, products, and orders.

Requirements:
- Strong consistency (ACID)
- Complex queries (joins, aggregations)
- < 200ms query time for 90% of queries
- Support 100k users initially

## Decision

Use PostgreSQL as primary database.

## Alternatives Considered

### MongoDB
- **Pros:** Flexible schema, horizontal scaling
- **Cons:** Team unfamiliar, eventual consistency issues
- **Why not:** Team expertise more valuable than flexibility

### DynamoDB
- **Pros:** Managed service, auto-scaling
- **Cons:** Vendor lock-in, limited query capability, cost
- **Why not:** Query limitations would hurt development velocity

### MySQL
- **Pros:** Similar to PostgreSQL, team knows it
- **Cons:** Less feature-rich than PostgreSQL
- **Why not:** PostgreSQL offers JSON support, better full-text search

## Consequences

**Positive:**
- Team can be productive immediately
- Strong consistency guarantees
- Rich query capabilities
- JSON support for flexible data

**Negative:**
- Vertical scaling limits (mitigated: can add read replicas)
- More complex than managed services (mitigated: use RDS)
- Higher operational overhead

**Trade-offs:**
- Chose familiarity over horizontal scaling
- Chose rich queries over eventual consistency
- Can re-evaluate if scale requirements change

## Validation

- Team confirmed expertise in PostgreSQL
- Load testing shows meets performance requirements
- Cost analysis shows acceptable for first year

Architecture Principles

1. Simplicity

Start simple, add complexity only when needed.

❌ BAD: Microservices from day 1 with 20 services
✅ GOOD: Start with monolith, split when needed

Apply YAGNI: You Aren't Gonna Need It

Don't build for hypothetical future
Add when actually needed
Simpler is easier to maintain

2. Separation of Concerns

Each component has one clear responsibility.

✅ GOOD:
- Auth Service: Authentication only
- User Service: User profile management
- Order Service: Order processing

❌ BAD:
- God Service: Does everything

Apply SOLID principles:

Single Responsibility
Open/Closed
Liskov Substitution
Interface Segregation
Dependency Inversion

3. Loose Coupling

Components depend on interfaces, not implementations.

// ❌ BAD: Tight coupling
class OrderService {
  constructor(private db: PostgresDatabase) {}
}

// ✅ GOOD: Loose coupling
class OrderService {
  constructor(private db: Database) {}  // Interface
}

Benefits:

Easier to test (mock interface)
Easier to swap implementations
Components can evolve independently

4. High Cohesion

Related functionality stays together.

✅ GOOD:
user/
  - create_user.ts
  - update_user.ts
  - delete_user.ts
  - user_repository.ts

❌ BAD:
create/
  - create_user.ts
  - create_order.ts
update/
  - update_user.ts
  - update_order.ts

5. Explicit Over Implicit

Make dependencies and contracts clear.

// ❌ BAD: Implicit dependency
function processOrder(orderId: string) {
  const db = global.database  // Where does this come from?
  // ...
}

// ✅ GOOD: Explicit dependency
function processOrder(
  orderId: string,
  db: Database,
  logger: Logger
) {
  // Dependencies are clear
}

6. Fail Fast

Detect and report errors early.

// ❌ BAD: Silent failure
function divide(a: number, b: number) {
  if (b === 0) return 0  // Wrong!
  return a / b
}

// ✅ GOOD: Fail fast
function divide(a: number, b: number) {
  if (b === 0) {
    throw new Error('Division by zero')
  }
  return a / b
}

7. Design for Testability

Make it easy to test.

// ❌ BAD: Hard to test
class OrderService {
  processOrder(orderId: string) {
    const db = new PostgresDatabase()  // Can't mock
    const api = new PaymentAPI()       // Can't mock
    // ...
  }
}

// ✅ GOOD: Easy to test
class OrderService {
  constructor(
    private db: Database,      // Can inject mock
    private api: PaymentAPI    // Can inject mock
  ) {}

  processOrder(orderId: string) {
    // ...
  }
}

Common Architecture Patterns

Layered Architecture

┌─────────────────────┐
│  Presentation       │ (UI, API controllers)
├─────────────────────┤
│  Business Logic     │ (Domain, services)
├─────────────────────┤
│  Data Access        │ (Repositories, ORMs)
├─────────────────────┤
│  Database           │ (Storage)
└─────────────────────┘

When to use: Simple to moderate complexity

Hexagonal Architecture (Ports & Adapters)

        ┌───────────────────────┐
        │   External Systems    │
        │  (UI, DB, APIs)       │
        └──────────┬────────────┘
                   │
        ┌──────────▼────────────┐
        │      Adapters         │ (Implementation)
        │  (REST, PostgreSQL)   │
        └──────────┬────────────┘
                   │
        ┌──────────▼────────────┐
        │       Ports           │ (Interfaces)
        │  (IUserRepo, IAuth)   │
        └──────────┬────────────┘
                   │
        ┌──────────▼────────────┐
        │    Core Domain        │ (Business logic)
        │    (Pure logic)       │
        └───────────────────────┘

When to use: Want to isolate business logic, multiple frontends

Microservices

┌─────────┐  ┌─────────┐  ┌─────────┐
│  User   │  │  Order  │  │ Payment │
│ Service │  │ Service │  │ Service │
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     └────────────┴────────────┘
                  │
          ┌───────▼────────┐
          │  Message Bus   │
          │  (Event-driven)│
          └────────────────┘

When to use: Large team, need independent deploy, clear boundaries

Avoid when: Small team, unclear boundaries, early stage

Event-Driven Architecture

┌─────────┐       ┌─────────────┐       ┌─────────┐
│Producer │──────▶│ Event Bus   │──────▶│Consumer │
└─────────┘       └─────────────┘       └─────────┘

When to use: Async processing, decoupled systems, audit trails

Anti-Patterns

❌ Premature Optimization

Don't optimize for scale you don't have.

BAD: Build microservices for 100 users
GOOD: Start with monolith, split when needed

❌ Resume-Driven Architecture

Don't choose technology to pad resume.

BAD: "I want to learn Kubernetes, let's use it"
GOOD: "Kubernetes fits our scale needs"

❌ Distributed Monolith

Microservices that are tightly coupled.

BAD: Service A can't deploy without Service B
GOOD: Services are independently deployable

❌ Big Ball of Mud

No structure, everything depends on everything.

BAD: Any code can call any other code
GOOD: Clear layers and boundaries

❌ Analysis Paralysis

Over-analyzing, never shipping.

BAD: Spend 6 months on perfect architecture
GOOD: Design enough to start, iterate

Architecture Review Checklist

Integration with Other Skills

Apply solid-principles - Guide component design
Apply simplicity-principles - KISS, YAGNI
Apply orthogonality-principle - Independent components
Apply structural-design-principles - Composition patterns
Use technical-planning - For implementation after design

Remember

Simplicity first - Start simple, add complexity when needed
Document decisions - Future you will thank you
Consider alternatives - Never the first idea only
State trade-offs - Every decision has consequences
Design for change - Systems evolve

The best architecture is the one that's simple enough to ship and flexible enough to evolve.

architecture-design