**Purpose**: Troubleshoot backend services, APIs, and application-level performance issues.
Diagnose backend performance issues like slow APIs, 5xx errors, and memory leaks. Provides targeted commands to check logs, metrics, and resource usage for root cause analysis.
/plugin marketplace add anton-abyzov/specweave/plugin install sw-infra@specweavePurpose: Troubleshoot backend services, APIs, and application-level performance issues.
Symptoms:
Diagnosis:
# Check for slow requests
grep "duration" /var/log/application.log | awk '{if ($5 > 1000) print}'
# Check error rate
grep "ERROR" /var/log/application.log | wc -l
# Check recent errors
tail -f /var/log/application.log | grep "ERROR"
Red flags:
# CPU usage
top -bn1 | grep "node\|java\|python"
# Memory usage
ps aux | grep "node\|java\|python" | awk '{print $4, $11}'
# Thread count
ps -eLf | grep "node\|java\|python" | wc -l
# Open file descriptors
lsof -p <PID> | wc -l
Red flags:
# If slow, likely database issue
# See database-diagnostics.md
# Check if query time matches API response time
# API response time = Query time + Application processing
# Check if calling external APIs
grep "http.request" /var/log/application.log
# Check external API response time
# Use APM tools or custom instrumentation
Red flags:
Mitigation:
Symptoms:
Diagnosis by Error Code:
Cause: Application code error
Diagnosis:
# Check application logs for exceptions
grep "Exception\|Error" /var/log/application.log | tail -20
# Check stack traces
tail -100 /var/log/application.log
Common causes:
Mitigation:
Cause: Reverse proxy can't reach backend
Diagnosis:
# Check if application is running
ps aux | grep "node\|java\|python"
# Check application port
netstat -tlnp | grep <PORT>
# Check reverse proxy logs (nginx, apache)
tail -f /var/log/nginx/error.log
Common causes:
Mitigation:
Cause: Application overloaded or unhealthy
Diagnosis:
# Check application health
curl http://localhost:<PORT>/health
# Check connection pool
# Database connections, HTTP connections
# Check queue depth
# Message queues, task queues
Common causes:
Mitigation:
Cause: Application took too long to respond
Diagnosis:
# Check what's slow
# Database query? External API? Long computation?
# Check application logs for slow operations
grep "slow\|timeout" /var/log/application.log
Common causes:
Mitigation:
Symptoms:
Diagnosis:
# Linux
watch -n 5 'ps aux | grep <PROCESS> | awk "{print \$4, \$5, \$6}"'
# Get heap dump (Java)
jmap -dump:format=b,file=heap.bin <PID>
# Get heap snapshot (Node.js)
node --inspect index.js
# Chrome DevTools → Memory → Take heap snapshot
Red flags:
// 1. Event listeners not removed
emitter.on('event', handler); // Never removed
// 2. Timers not cleared
setInterval(() => { /* ... */ }, 1000); // Never cleared
// 3. Global variables growing
global.cache = {}; // Grows forever
// 4. Closures holding references
function createHandler() {
const largeData = new Array(1000000);
return () => {
// Closure keeps largeData in memory
};
}
// 5. Connection leaks
const conn = await db.connect();
// Never closed → connection pool exhausted
Mitigation:
// 1. Remove event listeners
const handler = () => { /* ... */ };
emitter.on('event', handler);
// Later:
emitter.off('event', handler);
// 2. Clear timers
const intervalId = setInterval(() => { /* ... */ }, 1000);
// Later:
clearInterval(intervalId);
// 3. Use LRU cache
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000 });
// 4. Be careful with closures
function createHandler() {
return () => {
const largeData = loadData(); // Load when needed
};
}
// 5. Always close connections
const conn = await db.connect();
try {
await conn.query(/* ... */);
} finally {
await conn.close();
}
Symptoms:
Diagnosis:
# Top CPU processes
top -bn1 | head -20
# CPU per thread (Java)
top -H -p <PID>
# Profile application (Node.js)
node --prof index.js
node --prof-process isolate-*.log
Common causes:
Mitigation:
// 1. Break up heavy computation
async function processLargeArray(items) {
for (let i = 0; i < items.length; i++) {
await processItem(items[i]);
// Yield to event loop
if (i % 100 === 0) {
await new Promise(resolve => setImmediate(resolve));
}
}
}
// 2. Use worker threads (Node.js)
const { Worker } = require('worker_threads');
const worker = new Worker('./heavy-computation.js');
// 3. Cache results
const cache = new Map();
function expensiveOperation(input) {
if (cache.has(input)) return cache.get(input);
const result = /* heavy computation */;
cache.set(input, result);
return result;
}
// 4. Fix regex
// Bad: /(.+)*/ (catastrophic backtracking)
// Good: /(.+?)/ (non-greedy)
Symptoms:
Diagnosis:
# Database connections
# PostgreSQL:
SELECT count(*) FROM pg_stat_activity;
# MySQL:
SHOW PROCESSLIST;
# Application connection pool
# Check application metrics/logs
Red flags:
Common causes:
Mitigation:
// 1. Always close connections
async function queryDatabase() {
const conn = await pool.connect();
try {
const result = await conn.query('SELECT * FROM users');
return result;
} finally {
conn.release(); // CRITICAL
}
}
// 2. Use connection pool wrapper
const pool = new Pool({
max: 20, // max connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// 3. Monitor pool metrics
pool.on('error', (err) => {
console.error('Pool error:', err);
});
// 4. Increase pool size (if needed)
// But investigate leaks first!
Response Time:
Throughput:
Error Rate:
Resource Usage:
Availability:
When diagnosing slow backend:
Tools:
top, htop, ps, lsofcurl with timingDesigns feature architectures by analyzing existing codebase patterns and conventions, then providing comprehensive implementation blueprints with specific files to create/modify, component designs, data flows, and build sequences