This skill should be used when the user asks about "Temporal sizing", "history shards", "cluster capacity", "Temporal resources", "scale Temporal", "Temporal performance", "how many shards", or needs guidance on capacity planning for Temporal clusters.
From timelordnpx claudepluginhub therealbill/mynet --plugin timelordThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Guidance for sizing Temporal clusters based on workload requirements.
| Factor | Impact | Cannot Change |
|---|---|---|
| History Shards | Workflow parallelism | Yes (set at creation) |
| History Replicas | Throughput, availability | No |
| Matching Replicas | Task dispatch rate | No |
| Frontend Replicas | API request rate | No |
| Database Size | History storage | No |
Critical: History shards cannot be changed after cluster creation.
Shards determine maximum workflow parallelism. Each workflow belongs to one shard.
| Concurrent Workflows | Recommended Shards |
|---|---|
| < 10,000 | 128 |
| 10,000 - 100,000 | 256 |
| 100,000 - 500,000 | 512 |
| 500,000 - 2,000,000 | 1024 |
| > 2,000,000 | 2048 or 4096 |
shards = ceil(max_concurrent_workflows / 1000) * safety_factor
# Round up to nearest power of 2
# safety_factor = 2-4x for growth
Example: Expecting 50,000 concurrent workflows with 3x growth:
base = 50,000 / 1000 = 50
with_growth = 50 * 3 = 150
nearest_power_of_2 = 256 shards
Shards distribute across history service replicas:
shards_per_replica = total_shards / history_replicas
# Example: 512 shards, 4 replicas = 128 shards/replica
More replicas = better distribution = higher throughput.
Handles API requests, authentication, rate limiting.
| Load Level | Replicas | CPU | Memory |
|---|---|---|---|
| Low (<100 rps) | 1-2 | 500m | 1Gi |
| Medium (100-1000 rps) | 3 | 1 | 2Gi |
| High (1000-5000 rps) | 5 | 2 | 4Gi |
| Very High (>5000 rps) | 10+ | 4 | 8Gi |
Manages workflow state and event history.
| Shards | Replicas | CPU/replica | Memory/replica |
|---|---|---|---|
| 128 | 2 | 1 | 2Gi |
| 256 | 3 | 2 | 4Gi |
| 512 | 4-6 | 2 | 4Gi |
| 1024 | 8-12 | 4 | 8Gi |
| 2048 | 16-24 | 4 | 8Gi |
Dispatches tasks to workers.
| Task Rate | Replicas | CPU | Memory |
|---|---|---|---|
| Low (<1000/s) | 2 | 500m | 1Gi |
| Medium (1000-10000/s) | 3 | 1 | 2Gi |
| High (>10000/s) | 5+ | 2 | 4Gi |
Handles internal system workflows. Scale with cluster size:
| Cluster Size | Replicas | CPU | Memory |
|---|---|---|---|
| Small | 1 | 200m | 256Mi |
| Medium | 1 | 500m | 512Mi |
| Large | 2 | 1 | 1Gi |
| Workflow Volume | CPU | Memory | Storage | IOPS |
|---|---|---|---|---|
| < 100K workflows | 2 | 8GB | 100GB | 3000 |
| 100K-1M workflows | 4 | 16GB | 500GB | 6000 |
| 1M-10M workflows | 8 | 32GB | 1TB | 12000 |
| > 10M workflows | 16+ | 64GB+ | 2TB+ | 20000+ |
storage_per_workflow = avg_history_events * event_size
= 100 events * 1KB = 100KB
total_storage = workflows * storage_per_workflow * retention_multiplier
= 1,000,000 * 100KB * 1.5 = 150GB
Retention: Configure appropriate workflow retention to manage storage.
For visibility queries (optional but recommended):
| Indexed Workflows | Nodes | CPU/node | Memory/node | Storage/node |
|---|---|---|---|---|
| < 1M | 3 | 1 | 2Gi | 50Gi |
| 1M-10M | 3 | 2 | 4Gi | 200Gi |
| > 10M | 5+ | 4 | 8Gi | 500Gi |
server:
config:
numHistoryShards: 128
replicaCount:
frontend: 1
history: 1
matching: 1
worker: 1
resources:
frontend:
requests: {cpu: "250m", memory: "512Mi"}
history:
requests: {cpu: "500m", memory: "1Gi"}
matching:
requests: {cpu: "250m", memory: "512Mi"}
server:
config:
numHistoryShards: 256
replicaCount:
frontend: 3
history: 3
matching: 3
worker: 1
resources:
frontend:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
history:
requests: {cpu: "1", memory: "2Gi"}
limits: {cpu: "4", memory: "8Gi"}
matching:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
server:
config:
numHistoryShards: 1024
replicaCount:
frontend: 5
history: 10
matching: 5
worker: 2
resources:
frontend:
requests: {cpu: "2", memory: "4Gi"}
limits: {cpu: "4", memory: "8Gi"}
history:
requests: {cpu: "4", memory: "8Gi"}
limits: {cpu: "8", memory: "16Gi"}
matching:
requests: {cpu: "2", memory: "4Gi"}
limits: {cpu: "4", memory: "8Gi"}
Scale replicas when:
Increase resources when:
Key metrics to watch:
# History service load
sum(rate(temporal_persistence_requests_total[5m])) by (operation)
# Task latency (indicates matching capacity)
histogram_quantile(0.99, rate(temporal_schedule_to_start_latency_bucket[5m]))
# Workflow throughput
sum(rate(temporal_workflow_completed_total[5m]))
# Shard distribution
temporal_history_shard_count
| Mistake | Impact | Solution |
|---|---|---|
| Too few shards | Cannot scale later | Start with more shards |
| Undersized history | Latency spikes | Increase memory, replicas |
| Single frontend | Single point of failure | Minimum 2 for HA |
| No Elasticsearch | Slow visibility queries | Enable for production |
For detailed sizing calculations, consult:
references/sizing-calculator.md - Detailed sizing formulasreferences/benchmark-results.md - Performance benchmark data