Scalability engineering. Horizontal/vertical decisions, auto-scaling, read replicas, connection pooling, rate limiting, capacity planning.
From godmodenpx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
references/scalability-patterns.mdDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:scale, "scaling", "auto-scale", "read replica"SCALABILITY CONTEXT:
Architecture: Monolith | Microservices | Serverless
Current Scale: <RPS, concurrent users, data volume>
Target Scale: <target RPS, users, volume>
Bottleneck: CPU | Memory | I/O | Network | DB | API
SLA: <latency p50/p99, availability>
Database: <type, size, read/write ratio>
Peak vs Baseline: <ratio>
IF single-threaded workload: scale UP
IF DB write bottleneck AND not at ceiling: scale UP
IF at instance ceiling: scale OUT
IF stateless app tier AND read-heavy: scale OUT
IF peak-to-baseline ratio > 5x: scale OUT
IF need fault tolerance: scale OUT
WHEN quick fix needed AND below ceiling: scale UP first
# K8s HPA — target CPU 70%, memory 80%
kubectl autoscale deployment myapp \
--cpu-percent=70 --min=2 --max=20
# Check current HPA status
kubectl get hpa myapp -o wide
POLICIES:
Scale out: CPU >70% for 3min -> +2 instances
Scale fast: CPU >90% for 1min -> +4 instances
Scale in: CPU <30% for 10min -> -1 instance
Stabilization: scale-up 60s, scale-down 300s
KEDA for event-driven (queue depth, Prometheus)
Primary (Writer) -> Replica 1, 2, 3 (Read)
Writes -> primary | Strong reads -> primary
Eventually-consistent reads -> replica (round-robin)
Read-after-write -> primary for 5s, then replica
IF replication lag > 500ms: route reads to primary
IF lag > 2s: circuit breaker, alert, investigate
Formula: pool_size = (core_count * 2) + 1
Start small (10-20), monitor wait time
Problem: 100 instances * 20 pool = 2000 connections
Solution: PgBouncer multiplexes 2000 app -> 100 DB
pool_mode = transaction
default_pool_size = 50
max_client_conn = 2000
max_db_connections = 100
RATE LIMITING:
API Gateway: Token bucket, 1000 req/min per key
Per-user: Sliding window, 100 req/min
Per-endpoint: Fixed window, 50 req/min
Global: Leaky bucket, 10000 req/min
Burst: 100 tokens/sec refill, bucket 200
BACKPRESSURE:
Bounded queues (reject when full)
Load shedding priority:
health > auth > critical > standard > background
Circuit breaker on failing dependencies
Response: 429 + Retry-After + X-RateLimit-* headers.
# Measure current utilization
kubectl top pods --sort-by=cpu
kubectl top nodes
# Project headroom
echo "At 70% CPU threshold, runway = ..."
1. Measure baseline under normal load
2. Project growth (3/6/12 month)
3. Identify ceiling per component
4. Calculate runway (when each hits 70%/90%)
5. Plan scaling actions with cost estimates
LAYERS:
CDN: static 24h, API 60s
Application: Redis, TTL-based
Query: in-process 30s
PATTERNS: Cache-aside | Write-through | Write-behind
Target >95% hit rate
Stampede: locking, probabilistic early refresh
ls docker-compose.yml k8s/ terraform/ 2>/dev/null
grep -r "pgbouncer\|ProxySQL\|redis\|memcached" . \
--include="*.yml" --include="*.yaml" -l 2>/dev/null
kubectl get hpa 2>/dev/null
FOR EACH bottleneck:
1. Measure -> 2. Choose strategy -> 3. Implement
4. Load test at 2x peak -> 5. Measure improvement
IF improvement < 20%: wrong bottleneck, re-measure
IF improvement >= 20%: commit
IF cost exceeds budget: optimize first
Log to .godmode/scale-results.tsv:
timestamp\tsystem\tcurrent_rps\ttarget_rps\tbottleneck\tdirection\tcost_delta_pct\tverdict
Print: Scale: {bottleneck}. Strategy: {horizontal|vertical|caching}. Capacity: {before} -> {after}. Cost: {est}. Status: {DONE|PARTIAL}.
KEEP if: capacity improved AND latency maintained
AND cost within budget
DISCARD if: latency regressed OR cost exceeds budget
OR auto-scaler flapping. Revert on discard.
STOP when ALL of:
- Target capacity met with acceptable latency
- Auto-scaling configured and tested
- Cost within budget with headroom for spikes