From wicked-garden
System health overview from discovered observability sources. Aggregates errors, performance metrics, and SLO status across services. Correlates with deployments and code changes. Use for proactive health monitoring and post-deployment validation. Use when: "system health", "health check", "deployment health", "production status", "how is production"
npx claudepluginhub mikeparcewski/wicked-garden --plugin wicked-gardenThis skill uses the workspace's default tool permissions.
Aggregate system health from discovered observability sources with deployment correlation.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Aggregate system health from discovered observability sources with deployment correlation.
Use capability-based discovery to find available integrations:
# Discover available integrations via capability detection
# Scan for capabilities by analyzing server descriptions and resources:
# - error-tracking capability: Exception/error tracking and reporting
# - apm capability: Application performance monitoring and metrics
# - logging capability: Log aggregation, search, and analysis
# - tracing capability: Distributed tracing and service mapping
# - telemetry capability: Metrics collection and custom instrumentation
For each discovered source, collect:
HEALTHY: All metrics within SLO, no active alerts, stable trends DEGRADED: Some metrics elevated, minor alerts, or negative trends CRITICAL: SLO violations, critical alerts, or severe degradation
Check for recent changes that might impact health:
Based on health status:
This skill discovers integrations at runtime based on capability:
| Capability | What to Look For | Provides |
|---|---|---|
| error-tracking | Exception tracking, error reporting, crash analytics | Error rates, stack traces, user impact |
| apm | Performance monitoring, service metrics, observability | Latency, throughput, service health |
| logging | Log aggregation, log search, log analysis | Log aggregation, search, patterns |
| tracing | Distributed tracing, request tracing, trace analysis | Distributed traces, dependencies |
| telemetry | Metrics collection, custom instrumentation, time-series data | Custom metrics, instrumentation |
Fallback: If no integrations found, perform local analysis via wicked-garden:search for error patterns in code.
See refs/sources.md for detailed capability discovery patterns.
## System Health Report
**Overall Status**: [HEALTHY | DEGRADED | CRITICAL]
**Assessment Time**: {timestamp}
**Data Sources**: {list of integrations used}
### Health Summary
| Service | Status | Error Rate | Latency (p95) | SLO Status |
|---------|--------|------------|---------------|------------|
| {service} | {status} | {rate} | {latency} | {✓ or ✗} |
### Issues Detected
[For each issue]
**{Service}: {Issue Description}**
- Severity: [CRITICAL | HIGH | MEDIUM | LOW]
- Started: {timestamp}
- Metric: {specific metric and values}
- Pattern: {error pattern or behavior}
- Correlation: {deployment or change if found}
- Blast Radius: {impact scope}
### Trends (24h)
- Error Rates: {trend with percentage}
- Latency: {trend with percentage}
- Traffic: {trend with percentage}
### Recommendations
**Immediate**:
{critical actions needed now}
**Short-term**:
{optimizations and improvements}
**Capacity**:
{capacity planning insights}
Error rates or latency increase after deployment. Correlate metrics with deployment time and consider rollback.
Metrics slowly degrading over hours/days. Investigate memory leaks, growing data, cache efficiency.
Performance degrades with traffic spikes. Check capacity utilization and scaling policies.
Single service failure causes downstream issues. Use traces to identify root cause and implement circuit breakers.
When crew enters build phase:
Emit events:
observe:health:checked:successobserve:health:degraded:warningobserve:health:critical:failureWhen debugging issues, provide observability context: