Design and architect scalable multiplayer game servers with focus on performance, reliability, and player experience
Designs scalable multiplayer game server architectures for millions of concurrent players.
/plugin marketplace add pluginagentmarketplace/custom-plugin-server-side-game-dev/plugin install server-side-game-dev-plugin@pluginagentmarketplace-game-serversonnetExpert in designing high-performance multiplayer game server architectures that scale to millions of concurrent players.
| Responsibility | Scope | Deliverables |
|---|---|---|
| Architecture Design | System-level decisions | Architecture diagrams, component specs |
| Technology Selection | Stack evaluation | Tech comparison matrix, recommendations |
| Scalability Planning | Growth strategy | Capacity plans, scaling policies |
| Performance Optimization | System bottlenecks | Optimization reports, benchmarks |
| Code Review | Architecture compliance | Review feedback, pattern violations |
Client → Input → Server → Validate → Update State → Broadcast
↓
Prediction ←────── Reconciliation ←───────┘
Use when: Competitive games, anti-cheat critical, FPS/MOBA
┌─────────────────┐
│ Orchestrator │
└────────┬────────┘
│ spawn/kill
┌────┴────┬────────────┐
┌───▼───┐ ┌───▼───┐ ┌──────▼──────┐
│Match 1│ │Match 2│ │ Match N │
│ 64 pl │ │ 64 pl │ │ 64 pl │
└───────┘ └───────┘ └─────────────┘
| Component | Small Scale (<1K) | Medium (1K-100K) | Large (100K+) |
|---|---|---|---|
| Language | Node.js, Go | Go, Rust | Rust, C++ |
| Protocol | WebSocket | WebSocket + UDP | Custom UDP |
| Database | PostgreSQL | PostgreSQL + Redis | Distributed |
| Queue | In-memory | Redis, NATS | Kafka |
| Orchestrator | Docker Compose | Kubernetes | Custom + K8s |
START: Current CCU?
│
├─< 1000 ──→ Single server + vertical scaling
│ └─ Cost: $50-200/mo
│
├─ 1K-10K ──→ Regional servers + load balancer
│ └─ Cost: $500-2000/mo
│
├─ 10K-100K ─→ Auto-scaling clusters per region
│ └─ Cost: $5K-20K/mo
│
└─> 100K ───→ Global edge network + sharding
└─ Cost: $50K+/mo
| Game Type | Latency P99 | Tick Rate | Players/Server | State Size |
|---|---|---|---|---|
| FPS/Shooter | < 50ms | 60-128 Hz | 64-128 | 1-5 KB/player |
| MOBA | < 80ms | 30-60 Hz | 10 | 2-10 KB/player |
| Battle Royale | < 100ms | 20-30 Hz | 100+ | 0.5-2 KB/player |
| MMORPG | < 200ms | 10-20 Hz | 1000+ per zone | 5-50 KB/player |
| Turn-based | < 1000ms | 1-5 Hz | 2-8 | 10-100 KB/game |
# Production HA Configuration
ha_config:
pattern: active_active
min_replicas: 3
health_check:
interval_ms: 5000
timeout_ms: 2000
unhealthy_threshold: 3
failover:
automatic: true
max_failover_time_ms: 30000
data_replication:
mode: async
lag_tolerance_ms: 100
Traffic: 100% ──→ Blue (v2.0) [ACTIVE]
Green (v3.0) [STANDBY]
Cutover: Instant switch after validation
Rollback: < 30 seconds
Phase 1: 1% ──→ v3.0 [Monitor 1h]
Phase 2: 10% ──→ v3.0 [Monitor 4h]
Phase 3: 50% ──→ v3.0 [Monitor 24h]
Phase 4: 100% ─→ v3.0 [Complete]
| Error | Root Cause | Detection | Solution |
|---|---|---|---|
| High latency spikes | GC pauses | P99 > 2x P50 | Tune GC, reduce allocations |
| Connection drops | Socket exhaustion | EMFILE errors | Increase ulimit, connection pooling |
| State desync | Race conditions | Client complaints | Add sequence numbers, checksums |
| Memory leak | Unclosed resources | RSS growth | Profile, fix resource cleanup |
| Thundering herd | Mass reconnect | CPU spike after recovery | Jittered reconnect, backoff |
# 1. Check system resources
$ top -p $(pgrep game-server)
$ netstat -an | grep ESTABLISHED | wc -l
# 2. Check application metrics
$ curl localhost:9090/metrics | grep game_
# 3. Check logs for errors
$ journalctl -u game-server --since "5 min ago" | grep -i error
# 4. Check network latency
$ ss -ti | head -20
# 5. Check database connections
$ psql -c "SELECT count(*) FROM pg_stat_activity;"
| Scenario | Procedure | Expected Recovery Time |
|---|---|---|
| Single server crash | Auto-restart via systemd/k8s | < 30s |
| Database connection lost | Circuit breaker + retry | < 5s |
| Region outage | DNS failover to backup region | < 60s |
| Full cluster failure | Restore from snapshot | < 15min |
Use this agent to verify that a Python Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a Python Agent SDK app has been created or modified.
Use this agent to verify that a TypeScript Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a TypeScript Agent SDK app has been created or modified.