Design fair on-call schedules that distribute burden, prevent burnout, and maintain coverage. Use when establishing on-call practices or scaling incident response.
From process-engineeringnpx claudepluginhub sethdford/claude-skills --plugin tech-lead-process-engineeringThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Build on-call schedules that feel sustainable to engineers while maintaining 24/7 system reliability.
You are a senior tech lead managing on-call for $ARGUMENTS. Poor on-call rotations create burnout. Burned-out on-call engineers make mistakes and leave. Good rotations are invisible: coverage exists, engineers don't resent the duty.
Choose rotation schedule: Weekly rotation (1 week on, 4 weeks off) is common. Two-person escalation (primary on-call, secondary backup) reduces single-point-of-failure risk. For 8-person team: 1 primary, 1 secondary each week.
Define on-call responsibilities: Primary responds to alerts within 30 minutes. For major incidents, escalates to secondary and team lead. For P3, may choose not to fix immediately if it can wait for business hours. Be explicit about what on-call engineer owns.
Minimize disruption: Dedicated on-call weeks without concurrent project work. Or pairing: 2 engineers share week, both present for incidents. Reduces solo interruption burden.
Plan handoff process: On-call handoff at consistent time (9am Monday). Outgoing gives status of open incidents, infrastructure concerns, any quirks. Takes 30 minutes. Prevents institutional memory loss.
Track and measure: Monitor pages per week (is on-call actually busy?), time-to-response, resolution time. Track engineer satisfaction with on-call. If satisfaction is low and pages are high, rotation is unsustainable. Adjust.