From harness-claude
Implements /health (liveness) and /ready (readiness) endpoints for containerized microservices in Kubernetes/ECS. Express.js TypeScript example checks Prisma DB, Redis, and external services.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Implement /health and /ready endpoints for liveness and readiness probes in containers.
Implements liveness, readiness, startup, and deep health check endpoints with dependency monitoring. Use for Kubernetes probes, load balancers, auto-scaling, or fixing probe failures and startup delays.
Implements /health/live (liveness), /health/ready (readiness), and /health/startup endpoints with component checks for dependencies. For Kubernetes probes, load balancers, monitoring, and graceful shutdown.
Implements liveness, readiness, and dependency health check endpoints for services. Use for Kubernetes probes, load balancers, and monitoring in Express, Spring Boot, Flask.
Share bugs, ideas, or general feedback.
Implement /health and /ready endpoints for liveness and readiness probes in containers.
Two endpoints — always implement both:
GET /health → liveness probe: "Is the process alive?"
- Returns 200 if the process is running (even if dependencies are down)
- Kubernetes restarts the container if this fails repeatedly
- Should almost never fail (only if process is deadlocked)
GET /ready → readiness probe: "Can this instance handle traffic?"
- Returns 200 only if all critical dependencies are healthy
- Kubernetes removes instance from load balancer if this fails
- Common reasons it returns 503: DB not connected, cache not available, still starting
Full implementation with Express:
import express from 'express';
import { PrismaClient } from '@prisma/client';
import { Redis } from 'ioredis';
const app = express();
const prisma = new PrismaClient();
const redis = new Redis(process.env.REDIS_URL!);
// Liveness — is the process alive?
app.get('/health', (req, res) => {
res.status(200).json({
status: 'ok',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
pid: process.pid,
});
});
// Readiness — can we handle traffic?
app.get('/ready', async (req, res) => {
const checks: Record<string, { status: 'ok' | 'error'; latencyMs?: number; error?: string }> = {};
let allHealthy = true;
// Check database
const dbStart = Date.now();
try {
await prisma.$queryRaw`SELECT 1`;
checks.database = { status: 'ok', latencyMs: Date.now() - dbStart };
} catch (err) {
checks.database = { status: 'error', error: (err as Error).message };
allHealthy = false;
}
// Check Redis
const redisStart = Date.now();
try {
await redis.ping();
checks.redis = { status: 'ok', latencyMs: Date.now() - redisStart };
} catch (err) {
checks.redis = { status: 'error', error: (err as Error).message };
allHealthy = false; // or set false only if Redis is required
}
// Check external critical dependencies
try {
const response = await fetch(`${process.env.PAYMENT_SERVICE_URL}/health`, {
signal: AbortSignal.timeout(2_000),
});
checks.paymentService = {
status: response.ok ? 'ok' : 'error',
error: response.ok ? undefined : `HTTP ${response.status}`,
};
if (!response.ok) allHealthy = false;
} catch (err) {
checks.paymentService = { status: 'error', error: (err as Error).message };
allHealthy = false;
}
const httpStatus = allHealthy ? 200 : 503;
res.status(httpStatus).json({
status: allHealthy ? 'ready' : 'not ready',
timestamp: new Date().toISOString(),
checks,
});
});
// Example readiness response (healthy):
// {
// "status": "ready",
// "timestamp": "2024-01-15T10:30:00Z",
// "checks": {
// "database": { "status": "ok", "latencyMs": 3 },
// "redis": { "status": "ok", "latencyMs": 1 },
// "paymentService": { "status": "ok" }
// }
// }
Kubernetes probe configuration:
containers:
- name: order-service
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10 # wait for app to start
periodSeconds: 10 # check every 10s
failureThreshold: 3 # restart after 3 consecutive failures
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5 # start checking earlier (just connectivity)
periodSeconds: 5
failureThreshold: 2 # remove from LB after 2 consecutive failures
successThreshold: 1 # put back on LB after 1 success
timeoutSeconds: 3
startupProbe:
# For apps with slow startup (loading large ML models, etc.)
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 × 5s = 150 seconds to start
periodSeconds: 5
Graceful shutdown with readiness:
let isReady = true;
process.on('SIGTERM', async () => {
console.log('SIGTERM received — starting graceful shutdown');
// 1. Stop accepting new traffic
isReady = false; // readiness probe starts returning 503
// 2. Wait for in-flight requests to complete (give LB time to stop routing)
await new Promise((r) => setTimeout(r, 5_000));
// 3. Close connections
await prisma.$disconnect();
redis.disconnect();
process.exit(0);
});
// In readiness endpoint
app.get('/ready', async (req, res) => {
if (!isReady) {
res.status(503).json({ status: 'shutting down' });
return;
}
// ... other checks
});
What to check in readiness vs. liveness:
| /health (liveness) | /ready (readiness) | |
|---|---|---|
| Purpose | Is the process alive? | Can it handle traffic? |
| DB connectivity | No | Yes |
| Redis connectivity | No | Yes (if required) |
| External services | No | Critical ones only |
| Response time | Must be fast (<50ms) | Can check dependencies (< 3s) |
Startup probes: Use startupProbe for services with long startup times (model loading, schema migration). It replaces the liveness probe during startup — the service gets time to initialize before the liveness probe kicks in.
Anti-patterns:
Security: Health endpoints should not require auth (they're called by infrastructure). But they should not expose sensitive information like connection strings or internal IPs.
microservices.io/patterns/observability/health-check-api.html