Help us improve
Share bugs, ideas, or general feedback.
From datadops
Quick service health assessment providing a comprehensive overview of current service status using key metrics, active alerts, and recent events. Perfect for daily health checks, incident triage, or getting rapid service insights. Use when you need fast service status or as a starting point for deeper investigation.
npx claudepluginhub ahmidbbc/datadops --plugin datadopsHow this skill is triggered — by the user, by Claude, or both
Slash command
/datadops:service-health-overviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Rapid service health assessment providing actionable insights in under 60 seconds.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Rapid service health assessment providing actionable insights in under 60 seconds.
When invoked directly with /datadops:service-health-overview, use $ARGUMENTS as the scope for the report.
If the scope is unclear, ask whether to focus on a service, a team, or an environment.
✅ Green (90-100%): All systems operational
🟡 Yellow (75-89%): Minor issues, monitoring required
🔴 Red (0-74%): Critical issues, immediate action needed
Indicators:
- HTTP success rate
- Service uptime
- Critical endpoint availability
Metrics:
- Response time percentiles
- Throughput consistency
- Resource utilization
- Database performance
Thresholds:
- Excellent: All metrics within targets
- Good: Minor performance variations
- Poor: Significant performance degradation
Analysis:
- Error rate trends
- New error types
- Error severity distribution
- Customer impact assessment
Classifications:
- Low: Error rate < 1%, no critical errors
- Medium: Error rate 1-5%, some customer impact
- High: Error rate > 5%, significant customer impact
Components:
- Host/container health
- Resource availability
- Network connectivity
- Storage performance
Assessment:
- Healthy: All infrastructure optimal
- Warning: Resource constraints detected
- Critical: Infrastructure failures present
Overall Score: 95-100
Characteristics:
- Success rate > 99.5%
- Latency within targets
- No active alerts
- Stable resource usage
- Recent deployments successful
Recommended Actions:
- Continue monitoring
- Review performance trends
- Plan capacity for growth
Overall Score: 75-94
Characteristics:
- Success rate 95-99.5%
- Latency slightly elevated
- Minor alerts present
- Increasing resource usage
- Some deployment issues
Recommended Actions:
- Investigate warning indicators
- Review recent changes
- Prepare mitigation plans
- Increase monitoring frequency
Overall Score: 0-74
Characteristics:
- Success rate < 95%
- High latency or timeouts
- Critical alerts firing
- Resource exhaustion
- Failed deployments
Recommended Actions:
- Immediate investigation required
- Escalate to on-call team
- Prepare rollback procedures
- Customer communication
"Give me a health overview of the payment service."
Expected Response:
"Check the health of all checkout-related services."
Expected Response:
"Prepare a service health summary for our team standup."
Expected Response:
{
"payment_service": {
"success_rate_target": 99.9,
"latency_p95_target": 100,
"error_rate_threshold": 0.1
},
"search_service": {
"success_rate_target": 99.5,
"latency_p95_target": 500,
"error_rate_threshold": 0.5
}
}
# Claude Code interactive invocation
/datadops:service-health-overview critical production services
# Prompt-based automation examples
0 9 * * * claude -p "Give me a health overview of all critical production services. Summarize health scores, active alerts, recent events, and the top recommended actions."
0 * * * * claude -p "Give me a health overview of critical services and mention only active alerts, regressions, and immediate actions."