Guidance for scaling systems from startup to enterprise scale. Use when planning for growth, diagnosing bottlenecks, or designing systems that need to handle 10x-1000x current load.
/plugin marketplace add alirezarezvani/claude-cto-team/plugin install cto-team@cto-team-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Provides systematic guidance for scaling systems at different growth stages, identifying bottlenecks, and designing for horizontal scalability.
┌─────────────────────────────────────────────────────────────────────┐
│ SCALING JOURNEY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Stage 1 Stage 2 Stage 3 Stage 4 │
│ Startup Growth Scale Enterprise │
│ 0-10K users 10K-100K 100K-1M 1M+ users │
│ │
│ Single Add caching, Horizontal Global, │
│ server read replicas scaling multi-region │
│ │
│ $100/mo $1K/mo $10K/mo $100K+/mo │
└─────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Single Server │
│ ┌──────────────────────────────────┐ │
│ │ App Server (Node/Python/etc) │ │
│ │ + Database (PostgreSQL) │ │
│ │ + File Storage (local/S3) │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
| Metric | Target | Warning |
|---|---|---|
| Response time (P95) | < 500ms | > 1s |
| Database queries/request | < 10 | > 20 |
| Server CPU | < 70% | > 85% |
| Database connections | < 50% pool | > 80% pool |
DO:
DON'T:
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────────────────────────────┐ │
│ │ CDN │ │ Load Balancer │ │
│ └────┬────┘ └──────────────┬──────────────────┘ │
│ │ │ │
│ │ ┌──────────────┼──────────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Static │ │ App 1 │ │ App 2 │ │ App 3 │ │
│ │ Assets │ └────┬────┘ └────┬────┘ └────┬────┘ │
│ └─────────┘ │ │ │ │
│ └──────────────┼────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Primary │ │ Read │ │ Redis │ │
│ │ DB │───│ Replica │ │ Cache │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
| Component | Purpose | When to Add |
|---|---|---|
| CDN | Static asset caching | Images, JS, CSS taking > 20% bandwidth |
| Load Balancer | Distribute traffic | Single server CPU > 70% |
| Read Replicas | Offload reads | > 80% database ops are reads |
| Redis Cache | Application caching | Same queries repeated frequently |
| Job Queue | Async processing | Background tasks blocking requests |
Request Flow with Caching:
1. Check CDN (static assets) ─► HIT: Return cached
│
2. Check Application Cache (Redis) ─► HIT: Return cached
│
3. Check Database ─► Return + Cache result
What to Cache:
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
-- Find missing indexes
SELECT schemaname, tablename, indexrelname, idx_scan, seq_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND seq_scan > 1000;
┌──────────────────────────────────────────────────────────────────────┐
│ CDN / Edge │
└──────────────────────────────────────────────────────────────────────┘
│
┌──────────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ (Rate limiting, Auth, Routing) │
└──────────────────────────────────────────────────────────────────────┘
│
┌───────────────────────────┼───────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Service A │ │ Service B │ │ Service C │
│ (Users) │ │ (Orders) │ │ (Search) │
│ Auto-scale │ │ Auto-scale │ │ Auto-scale │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ User DB │ │ Order DB │ │ Elasticsearch │
│ (Sharded) │ │ (Sharded) │ │ (Cluster) │
└───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────────────────┐
│ Message Queue │
│ (Kafka / SQS) │
└───────────────────────────┘
Sharding Strategies:
1. Hash-based (user_id % num_shards)
PRO: Even distribution
CON: Hard to add shards
2. Range-based (user_id 1-1M → shard 1)
PRO: Easy to add shards
CON: Hotspots possible
3. Directory-based (lookup table)
PRO: Flexible
CON: Lookup overhead
Synchronous → Asynchronous
Before:
API → Service A → Service B → Service C → Response (slow)
After:
API → Service A → Queue → Response (fast)
↓
Service B, C process async
┌─────────────────────────────────────────────────────────────────────────┐
│ Global Load Balancer │
│ (GeoDNS, Anycast, Route53) │
└─────────────────────────────────────────────────────────────────────────┘
│ │
┌────────┴────────┐ ┌───────┴────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ US-East │ │ US-West │ │ EU-West │ │ AP-South │
│ Region │ │ Region │ │ Region │ │ Region │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │Services│ │ │ │Services│ │ │ │Services│ │ │ │Services│ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │Database│ │ │ │Database│ │ │ │Database│ │ │ │Database│ │
│ │(Primary)│ │ │ │(Replica)│ │ │ │(Primary)│ │ │ │(Replica)│ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│ │
└─────────┬─────────┘
│
Cross-Region Replication
| Pattern | Consistency | Latency | Complexity |
|---|---|---|---|
| Active-Passive | Strong | High failover | Low |
| Active-Active | Eventual | Low | High |
| Follow-the-Sun | Strong per region | Medium | Medium |
CAP Theorem Trade-offs:
Strong Consistency (CP):
- All regions see same data
- Higher latency for writes
- Use for: Financial transactions, inventory
Eventual Consistency (AP):
- Regions may have stale data briefly
- Low latency always
- Use for: Social feeds, analytics, non-critical
Causal Consistency:
- Related operations ordered correctly
- Balance of latency and correctness
- Use for: Messaging, collaboration
Systematic Diagnosis:
1. Where is time spent?
└─► Distributed tracing (Jaeger, Datadog)
2. Is it the database?
└─► Check slow query logs, connection pool
3. Is it the application?
└─► CPU profiling, memory analysis
4. Is it the network?
└─► Latency between services, DNS resolution
5. Is it external services?
└─► Third-party API latency, rate limits
| Layer | Symptoms | Solutions |
|---|---|---|
| Database | Slow queries, high CPU | Indexing, read replicas, caching |
| Application | High CPU, memory | Optimize code, scale horizontally |
| Network | High latency, timeouts | CDN, edge caching, connection pooling |
| Storage | Slow I/O, high wait | SSD, object storage, caching |
| External APIs | Timeouts, rate limits | Circuit breakers, caching, fallbacks |
## Quick Database Health Check
1. Connection Pool
- Current connections vs max?
- Connection wait time?
- Pool exhaustion events?
2. Query Performance
- Slowest queries (pg_stat_statements)?
- Missing indexes (seq scans > 10K)?
- Lock contention?
3. Replication
- Replica lag?
- Write throughput?
- Read distribution?
4. Storage
- Disk I/O wait?
- Table/index bloat?
- WAL write latency?
Required Capacity = Peak Traffic × Growth Factor × Safety Margin
Example:
- Current peak: 1,000 req/sec
- Expected growth: 3x in 12 months
- Safety margin: 1.5x
Required: 1,000 × 3 × 1.5 = 4,500 req/sec capacity
Connection Pool Size:
connections = (num_cores × 2) + effective_spindle_count
Example: 8 cores, SSD
connections = (8 × 2) + 1 = 17 connections per instance
Read Replica Sizing:
replicas = ceiling(read_traffic / single_replica_capacity)
Example: 10,000 reads/sec, 3,000/replica capacity
replicas = ceiling(10,000 / 3,000) = 4 replicas
Cache Size:
memory = working_set_size × (1 + overhead_factor)
Working set = frequently accessed data (usually 10-20% of total)
Overhead = ~1.5x for Redis data structures
Example: 10GB working set
Redis memory = 10GB × 1.5 = 15GB
| Symptom | First Try | Then Try | Finally |
|---|---|---|---|
| Slow page loads | Add caching | CDN | Edge compute |
| Database slow | Add indexes | Read replicas | Sharding |
| API timeouts | Async processing | Circuit breakers | Event-driven |
| High server CPU | Vertical scale | Horizontal scale | Optimize code |
| High memory | Increase RAM | Fix memory leaks | Redesign data structures |
| Users | Architecture | Monthly Cost |
|---|---|---|
| 10K | Single server | $100-300 |
| 100K | Load balanced + cache | $1,000-3,000 |
| 1M | Microservices + sharding | $10,000-30,000 |
| 10M | Multi-region | $100,000+ |
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.