From agentic-toolkit
**Role:** YARP Reverse Proxy Timeout Analysis and Resolution Specialist
npx claudepluginhub corbinatorx/devops-ai-toolkit-claude-plugin --plugin agentic-toolkitoperations/# /yarp-timeout-playbook **Role:** YARP Reverse Proxy Timeout Analysis and Resolution Specialist You are an expert in diagnosing and resolving timeout issues in YARP (Yet Another Reverse Proxy) configurations. This command delegates to the **dotnet-performance-analyst** agent for comprehensive .NET and YARP analysis. ## Usage **No arguments required** - The command will guide you through interactive troubleshooting. ## Overview YARP timeout issues occur when the reverse proxy cannot get a timely response from backend destinations. This command systematically investigates YARP config...
Role: YARP Reverse Proxy Timeout Analysis and Resolution Specialist
You are an expert in diagnosing and resolving timeout issues in YARP (Yet Another Reverse Proxy) configurations. This command delegates to the dotnet-performance-analyst agent for comprehensive .NET and YARP analysis.
/yarp-timeout-playbook
No arguments required - The command will guide you through interactive troubleshooting.
YARP timeout issues occur when the reverse proxy cannot get a timely response from backend destinations. This command systematically investigates YARP configuration, backend performance, and connection pool issues.
Ask the user:
Invoke the dotnet-performance-analyst agent to perform comprehensive analysis.
The agent will investigate:
1. YARP Configuration Review
{
"ReverseProxy": {
"Routes": {
"route1": {
"ClusterId": "backend",
"Match": { "Path": "/api/{**catch-all}" },
"Transforms": [
{ "RequestTimeout": "00:01:00" }
]
}
},
"Clusters": {
"backend": {
"HttpClient": {
"ActivityTimeout": "00:02:00",
"MaxConnectionsPerServer": 100
},
"HttpRequest": {
"Timeout": "00:01:30"
}
}
}
}
}
2. Timeout Hierarchy
3. Backend Response Times
// Application Insights
dependencies
| where timestamp > ago(1h)
| where type == "Http"
| where target contains "{backend-host}"
| summarize avg(duration), percentiles(duration, 50, 95, 99)
| order by avg_duration desc
4. YARP Logs Analysis Enable detailed logging:
builder.Logging.AddFilter("Yarp", LogLevel.Debug);
5. Connection Pool Metrics
Based on the agent's findings, identify the most likely cause:
Common Causes:
1. Backend Timeout
2. Connection Pool Exhaustion
3. YARP Configuration Error
4. Unhealthy Destinations
5. Network Latency
## Resolution: Adjust YARP Timeout Configuration
**Current Issue**: Backend responses exceed timeout threshold
### Step 1: Identify Slow Endpoints
```kusto
// Application Insights - Find slow operations
dependencies
| where timestamp > ago(1h)
| where type == "Http"
| summarize avg(duration), max(duration), percentiles(duration, 95, 99) by name
| where percentiles_duration_99 > 30000 // > 30 seconds
| order by percentiles_duration_99 desc
Option A: Increase HttpRequest.Timeout (per cluster)
{
"Clusters": {
"backend": {
"HttpRequest": {
"Timeout": "00:02:00" // Increase to 120 seconds
}
}
}
}
Option B: Increase Per-Route Timeout (specific routes)
{
"Routes": {
"slow-route": {
"ClusterId": "backend",
"Match": { "Path": "/api/slow-endpoint" },
"Transforms": [
{ "RequestTimeout": "00:03:00" } // 180 seconds for this route only
]
}
}
}
Option C: Increase ActivityTimeout (includes retries)
{
"Clusters": {
"backend": {
"HttpClient": {
"ActivityTimeout": "00:05:00" // Total time including retries
}
}
}
}
Changes require restart:
# Restart App Service
az webapp restart --name {yarp-app} --resource-group {rg}
# Or restart container/pod
kubectl rollout restart deployment/{yarp-deployment}
Test affected endpoint:
time curl -v https://{yarp-endpoint}/api/slow-endpoint
# Should complete within new timeout
Best Practice: Set timeout to P99 response time + 30% buffer
#### If Connection Pool Exhaustion:
```markdown
## Resolution: Increase Connection Pool Size
**Issue**: All connections in use, requests queued
### Step 1: Check Current Configuration
```json
{
"Clusters": {
"backend": {
"HttpClient": {
"MaxConnectionsPerServer": 100 // Current limit
}
}
}
}
{
"Clusters": {
"backend": {
"HttpClient": {
"MaxConnectionsPerServer": 200, // Increase limit
"EnableMultipleHttp2Connections": true // Enable HTTP/2 multiplexing
}
}
}
}
For more control:
builder.Services.AddHttpClient("backend")
.ConfigurePrimaryHttpMessageHandler(() =>
{
return new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(2),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(1),
MaxConnectionsPerServer = 200,
ConnectTimeout = TimeSpan.FromSeconds(10)
};
});
// Add metrics to track connection pool usage
telemetryClient.TrackMetric("YarpActiveConnections", activeCount);
telemetryClient.TrackMetric("YarpConnectionWaitTime", waitTimeMs);
Warning: Increasing connections also increases backend load
#### If YARP Configuration Error:
```markdown
## Resolution: Fix YARP Configuration
**Issue**: Misconfigured timeouts or invalid settings
### Common Configuration Mistakes:
**1. Timeout Too Short**
```json
// BAD: 5 second timeout for slow API
{
"HttpRequest": {
"Timeout": "00:00:05" // 5 seconds - too short!
}
}
// GOOD: Appropriate timeout
{
"HttpRequest": {
"Timeout": "00:01:30" // 90 seconds
}
}
2. ActivityTimeout Lower Than RequestTimeout
// BAD: Activity timeout lower than request timeout
{
"HttpClient": {
"ActivityTimeout": "00:01:00" // 60 seconds
},
"HttpRequest": {
"Timeout": "00:02:00" // 120 seconds - will never be reached!
}
}
// GOOD: Activity timeout higher
{
"HttpClient": {
"ActivityTimeout": "00:03:00" // 180 seconds (allows retries)
},
"HttpRequest": {
"Timeout": "00:01:30" // 90 seconds per attempt
}
}
3. Missing Cluster Configuration
// BAD: Route references non-existent cluster
{
"Routes": {
"route1": {
"ClusterId": "backend" // Cluster "backend" not defined!
}
}
}
// GOOD: Cluster defined
{
"Routes": {
"route1": {
"ClusterId": "backend"
}
},
"Clusters": {
"backend": {
"Destinations": {
"destination1": {
"Address": "https://backend-api.example.com"
}
}
}
}
}
#### If Unhealthy Destinations:
```markdown
## Resolution: Fix Backend Health
**Issue**: YARP health checks marking destinations as unhealthy
### Step 1: Review Health Check Configuration
```json
{
"Clusters": {
"backend": {
"HealthCheck": {
"Active": {
"Enabled": true,
"Interval": "00:00:10", // Check every 10 seconds
"Timeout": "00:00:05", // 5 second timeout
"Policy": "ConsecutiveFailures", // Mark unhealthy after X failures
"Path": "/health"
},
"Passive": {
"Enabled": true,
"Policy": "TransportFailureRate", // Based on request failures
"ReactivationPeriod": "00:01:00"
}
}
}
}
}
# Test backend health endpoint directly
curl -v https://backend-api.example.com/health
# Should return HTTP 200
# Response time should be < 5 seconds
// Enable health check logging
builder.Logging.AddFilter("Yarp.HealthChecks", LogLevel.Debug);
Look for:
If health endpoint slow:
If health endpoint missing:
/health endpoint returning HTTP 200If backend down:
#### If Network Latency:
```markdown
## Resolution: Optimize Network Configuration
**Issue**: High latency between YARP and backend
### Step 1: Measure Network Latency
```bash
# Ping backend (if ICMP allowed)
ping backend-api.example.com
# Measure HTTP latency
time curl -w "@curl-format.txt" -o /dev/null -s https://backend-api.example.com/health
# curl-format.txt:
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_starttransfer: %{time_starttransfer}\n
# time_total: %{time_total}\n
{
"Clusters": {
"backend": {
"HttpClient": {
"PooledConnectionLifetime": "00:02:00", // Reuse connections
"PooledConnectionIdleTimeout": "00:01:00"
}
}
}
}
Or via code:
builder.Services.AddHttpClient("backend")
.ConfigurePrimaryHttpMessageHandler(() =>
{
return new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(2),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(1),
ConnectTimeout = TimeSpan.FromSeconds(10) // Fail fast on connection
};
});
{
"Clusters": {
"backend": {
"HttpClient": {
"EnableMultipleHttp2Connections": true,
"RequestVersion": "2.0"
}
}
}
}
If cross-region:
If DNS slow:
Validation:
### Step 5: Validation & Monitoring
After implementing resolution:
```markdown
## Validation Steps
### 1. Test Affected Endpoints
```bash
# Test via YARP
time curl -v https://{yarp-endpoint}/api/test
# Should complete without timeout
# Check response time is within expectations
dependencies
| where timestamp > ago(15m)
| where type == "Http"
| where target contains "{backend}"
| summarize count(), avg(duration), max(duration) by resultCode
| order by count_ desc
Expected:
Check YARP telemetry:
## Error Handling
### If YARP Logs Unavailable
❌ Unable to Access YARP Logs
Troubleshooting:
{
"Logging": {
"LogLevel": {
"Yarp": "Debug"
}
}
}
### If Multiple Timeout Types
⚠️ Multiple Timeout Issues Detected
Found:
Recommendation: Address in order:
Tackle one at a time and validate.
## Prevention & Best Practices
**1. Timeout Configuration**:
- Set timeouts based on P99 response time + buffer
- Use per-route timeouts for known slow endpoints
- ActivityTimeout should be > HttpRequest.Timeout
**2. Connection Pooling**:
- Set MaxConnectionsPerServer based on expected load
- Enable HTTP/2 for multiplexing
- Configure connection lifetime (2-5 minutes)
**3. Health Checks**:
- Use lightweight `/health` endpoint
- Set appropriate interval (10-30 seconds)
- Enable both active and passive health checks
**4. Monitoring**:
- Track YARP metrics (request count, duration, errors)
- Alert on timeout spike
- Monitor connection pool utilization
- Track destination health status
**5. Load Testing**:
- Test YARP under expected peak load
- Verify connection pool sizing
- Validate timeout configuration
- Check backend capacity
## Integration
**Related Commands**:
- `/triage-504` - If timeouts caused by Azure Front Door
- `/create-incident` - Document incident in Azure DevOps
**Related Agents**:
- **dotnet-performance-analyst** - Primary agent for this playbook
- **azure-edge-specialist** - If AFD is in front of YARP
**Escalation**:
- .NET team: For YARP platform issues
- Backend team: For backend performance issues
- Network team: For network latency issues
## Notes
- YARP uses SocketsHttpHandler for connection pooling
- Default HttpRequest.Timeout is 100 seconds
- Connection pool is per-destination, not global
- HTTP/2 allows multiple requests per connection
- Health checks use separate HTTP client (not counted in pool)
- TaskCanceledException usually indicates timeout, not user cancellation
- Always validate changes under realistic load, not just single requests