From harness-claude
Prevents cascading failures in microservices using circuit breakers with closed/open/half-open states and fallbacks. Includes opossum Node.js/TypeScript example and manual implementation for slow/unavailable dependencies.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Prevent cascading failures with circuit breaker, half-open state, and fallback logic.
Implements circuit breaker pattern to protect services from cascading failures by blocking requests to failing dependencies until recovery. TypeScript/Node.js examples with opossum and fallbacks for external APIs.
Implements circuit breaker patterns for fault tolerance, failure detection, and fallbacks. Use for external API calls, microservices, databases, and preventing cascading failures. Includes TypeScript, Node.js, Python, Java guides.
Assists implementing circuit breakers, retries, bulkheads, and resilience patterns for fault-tolerant distributed systems.
Share bugs, ideas, or general feedback.
Prevent cascading failures with circuit breaker, half-open state, and fallback logic.
Circuit breaker states:
CLOSED → normal operation, requests pass through
↓ (error rate exceeds threshold)
OPEN → requests fail immediately (fail fast)
↓ (after reset timeout)
HALF-OPEN → let one request through as a probe
↓ (probe succeeds) ↓ (probe fails)
CLOSED OPEN
opossum (Node.js circuit breaker library):
import CircuitBreaker from 'opossum';
// Wrap the fragile operation
async function fetchUserFromService(userId: string): Promise<User> {
const response = await fetch(`${process.env.USER_SERVICE_URL}/users/${userId}`, {
signal: AbortSignal.timeout(3_000), // timeout
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
}
const breaker = new CircuitBreaker(fetchUserFromService, {
timeout: 3_000, // call timeout
errorThresholdPercentage: 50, // open after 50% errors in rolling window
resetTimeout: 30_000, // try half-open after 30s
rollingCountTimeout: 10_000, // rolling window for error rate
rollingCountBuckets: 10, // 10 × 1s buckets
volumeThreshold: 5, // min requests before circuit can open
});
// Fallback — what to return when circuit is open
breaker.fallback((userId: string) => ({
id: userId,
name: 'Unknown User',
cached: true,
}));
// Events for monitoring
breaker.on('open', () => {
console.error('Circuit OPENED — user service is unavailable');
metrics.increment('circuit_breaker.user_service.open');
});
breaker.on('halfOpen', () => {
console.log('Circuit HALF-OPEN — probing user service');
});
breaker.on('close', () => {
console.log('Circuit CLOSED — user service recovered');
metrics.increment('circuit_breaker.user_service.closed');
});
breaker.on('fallback', (result) => {
console.warn('Fallback triggered:', result);
metrics.increment('circuit_breaker.user_service.fallback');
});
// Usage — same interface as the raw function
const user = await breaker.fire(userId);
Manual circuit breaker (without library):
enum CircuitState {
CLOSED,
OPEN,
HALF_OPEN,
}
class CircuitBreaker<T, Args extends unknown[]> {
private state = CircuitState.CLOSED;
private failureCount = 0;
private lastFailureTime = 0;
constructor(
private readonly operation: (...args: Args) => Promise<T>,
private readonly options: {
failureThreshold: number;
resetTimeoutMs: number;
fallback?: (...args: Args) => T;
}
) {}
async execute(...args: Args): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (Date.now() - this.lastFailureTime >= this.options.resetTimeoutMs) {
this.state = CircuitState.HALF_OPEN;
} else {
if (this.options.fallback) return this.options.fallback(...args);
throw new Error('Circuit is OPEN');
}
}
try {
const result = await this.operation(...args);
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
if (this.options.fallback) return this.options.fallback(...args);
throw err;
}
}
private onSuccess(): void {
this.failureCount = 0;
this.state = CircuitState.CLOSED;
}
private onFailure(): void {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.options.failureThreshold) {
this.state = CircuitState.OPEN;
console.error(`Circuit opened after ${this.failureCount} failures`);
}
}
getState(): CircuitState {
return this.state;
}
}
Combining with retry:
// Retry: handle transient failures (network glitch)
// Circuit breaker: handle systemic failures (service is down)
// Order matters: retry first, then circuit breaker trips if failures persist
const withRetry = async <T>(fn: () => Promise<T>, attempts = 3): Promise<T> => {
for (let i = 1; i <= attempts; i++) {
try {
return await fn();
} catch (err) {
if (i === attempts) throw err;
await new Promise((r) => setTimeout(r, 200 * i)); // 200ms, 400ms
}
}
throw new Error('Unreachable');
};
// Circuit breaker wraps the retry-enabled call
const robustFetchUser = new CircuitBreaker(
(userId: string) => withRetry(() => fetchUserFromService(userId), 2),
{ failureThreshold: 5, resetTimeoutMs: 30_000 }
);
Fallback strategies:
Threshold tuning:
errorThresholdPercentage: 50% is a good default. Lower for critical dependencies, higher for noisy but non-critical ones.resetTimeout: 30s is a good starting point. If recovery takes longer (e.g., DB restart), increase it.volumeThreshold: Prevents opening on first few calls during startup.Anti-patterns:
Bulkhead + circuit breaker: Use both. Bulkhead limits concurrency (prevents a slow service from consuming all threads). Circuit breaker detects failure and stops sending requests. They complement each other.
microservices.io/patterns/reliability/circuit-breaker.html