This skill should be used when the user asks about "workflow stuck", "workflow failing", "temporal error", "debug workflow", "diagnose temporal", "workflow not completing", "activity timeout", "non-deterministic error", or needs help resolving Temporal issues.
From timelordnpx claudepluginhub therealbill/mynet --plugin timelordThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Guidance for diagnosing and resolving common Temporal workflow issues.
# Describe workflow
temporal workflow describe --workflow-id <id>
# List running workflows
temporal workflow list --query "ExecutionStatus='Running'"
# List failed workflows
temporal workflow list --query "ExecutionStatus='Failed'"
# Show all events
temporal workflow show --workflow-id <id>
# Get JSON for parsing
temporal workflow show --workflow-id <id> --output json > history.json
# Find specific events
temporal workflow show --workflow-id <id> --output json | \
jq '.events[] | select(.eventType | contains("Failed"))'
# Describe task queue
temporal task-queue describe --task-queue <queue>
Symptoms:
Diagnosis Tree:
Workflow stuck
├── Last event is ActivityTaskScheduled?
│ ├── Workers running? → Check/start workers
│ ├── Correct task queue? → Fix queue name
│ └── Activity timed out? → Check timeout config
├── Last event is TimerStarted?
│ └── Timer still pending → Wait or reset
├── Last event is SignalExternalWorkflow?
│ └── Target workflow not responding → Check target
└── Last event is WorkflowTaskScheduled?
├── Workers running? → Check/start workers
└── Workflow error? → Check worker logs
Commands:
# Check what workflow is waiting for
temporal workflow show --workflow-id <id> | tail -20
# Check if workers are connected
temporal task-queue describe --task-queue <queue>
Solutions:
| Waiting For | Solution |
|---|---|
| Activity | Start workers, fix task queue |
| Timer | Wait or reset workflow |
| Signal | Send signal or reset |
| Child workflow | Check child status |
| Nexus operation | Check endpoint and handler workers |
Symptoms:
non-deterministic workflow definitionhistory mismatchCommon Causes:
Time-based decisions
// BAD
if time.Now().Hour() < 12 { }
// GOOD
if workflow.Now(ctx).Hour() < 12 { }
Random values
// BAD
id := uuid.New()
// GOOD
var id string
workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
return uuid.New().String()
}).Get(&id)
Map iteration order
// BAD - order varies
for k, v := range myMap { }
// GOOD - deterministic order
keys := make([]string, 0, len(myMap))
for k := range myMap {
keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
v := myMap[k]
}
Code changes without versioning
// When changing workflow logic, use GetVersion
v := workflow.GetVersion(ctx, "change-id", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
// Old logic
} else {
// New logic
}
Diagnosis:
# Export history
temporal workflow show --workflow-id <id> --output json > history.json
# Create replay test
func TestReplay(t *testing.T) {
replayer := worker.NewWorkflowReplayer()
replayer.RegisterWorkflow(YourWorkflow)
err := replayer.ReplayWorkflowHistoryFromJSONFile(nil, "history.json")
require.NoError(t, err)
}
Symptoms:
ActivityTaskTimedOut events in historyTimeout Types:
| Timeout | Meaning | Solution |
|---|---|---|
| ScheduleToStart | Waiting for worker | Add workers, check queue |
| StartToClose | Execution too long | Increase timeout or optimize |
| ScheduleToClose | Total time exceeded | Increase or split activity |
| Heartbeat | No heartbeat received | Add heartbeat, check worker |
Diagnosis:
# Find timeout events
temporal workflow show --workflow-id <id> --output json | \
jq '.events[] | select(.eventType == "ActivityTaskTimedOut")'
Solutions:
ao := workflow.ActivityOptions{
// Increase timeouts if needed
StartToCloseTimeout: 30 * time.Minute,
// Add heartbeat for long activities
HeartbeatTimeout: 30 * time.Second,
// Configure retries
RetryPolicy: &temporal.RetryPolicy{
MaximumAttempts: 5,
},
}
Symptoms:
ActivityTaskFailed eventsDiagnosis:
# Find failure details
temporal workflow show --workflow-id <id> --output json | \
jq '.events[] | select(.eventType == "ActivityTaskFailed") | .activityTaskFailedEventAttributes'
Common Causes:
| Error Type | Cause | Solution |
|---|---|---|
| Connection error | Network issue | Add retries, check connectivity |
| Panic | Code bug | Fix activity code |
| Application error | Business logic | Check error handling |
| Resource exhausted | Rate limiting | Add backoff, reduce load |
Symptoms:
Check Points:
# Schedule-to-start latency (Prometheus)
histogram_quantile(0.99, rate(temporal_schedule_to_start_latency_bucket[5m]))
# Persistence latency
histogram_quantile(0.99, rate(temporal_persistence_latency_bucket[5m]))
Solutions:
| Latency Location | Solution |
|---|---|
| Schedule-to-start | Add workers |
| Activity execution | Optimize activity |
| Database | Scale database |
| Network | Check connectivity |
Symptoms:
NexusOperationFailed or NexusOperationTimedOut events in caller workflow historyDiagnosis Tree:
Nexus operation issue
├── NexusOperationTimedOut?
│ ├── scheduleToCloseTimeout too short → Increase timeout
│ └── Handler workflow stuck → Debug handler workflow in handler namespace
├── NexusOperationFailed?
│ ├── OperationError → Check handler operation logic
│ ├── HandlerError → Check handler worker logs/infrastructure
│ └── Endpoint misconfigured → Verify endpoint config
└── NexusOperationScheduled but never started?
├── Endpoint exists? → temporal operator nexus endpoint list
├── Handler workers running? → Check handler task queue
└── Target namespace accessible? → Verify namespace exists
Diagnostic Commands:
# Check Nexus events in caller workflow
temporal workflow show --workflow-id <caller-wf-id> --output json | \
jq '.events[] | select(.eventType | contains("Nexus"))'
# List Nexus endpoints
temporal operator nexus endpoint list
# Describe specific endpoint
temporal operator nexus endpoint describe --name <endpoint-name>
# Check handler task queue in handler namespace
temporal task-queue describe --task-queue <handler-tq> --namespace <handler-ns>
Common Nexus Issues:
| Issue | Cause | Solution |
|---|---|---|
| Endpoint not found | Endpoint not created or wrong name | Create/verify endpoint |
| Handler not responding | No workers on handler task queue | Start handler workers |
| Operation timeout | scheduleToCloseTimeout too short | Increase caller timeout |
| Handler error | Bug in handler operation code | Fix handler code |
| Cross-namespace auth | Permissions not configured | Configure namespace access |
Reset to retry from a specific point:
# Reset to specific event
temporal workflow reset \
--workflow-id <id> \
--event-id <event-id> \
--reason "Reset for retry"
# Reset to last workflow task
temporal workflow reset \
--workflow-id <id> \
--type LastWorkflowTask \
--reason "Reset after fix"
Force stop a stuck workflow:
temporal workflow terminate \
--workflow-id <id> \
--reason "Manual termination - issue description"
Request graceful cancellation:
temporal workflow cancel --workflow-id <id>
For detailed error catalogs, consult:
references/error-catalog.md - Complete error referencereferences/diagnostic-queries.md - Prometheus queries