From agents
Design and review logs, metrics, traces, SLOs, and alerting for reliable systems. Use for telemetry strategy and coverage gaps. NOT for live incident command or vendor-specific setup.
npx claudepluginhub wyattowalsh/agents --plugin agentsThis skill uses the workspace's default tool permissions.
Design and review telemetry that helps teams detect, diagnose, and improve
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Design and review telemetry that helps teams detect, diagnose, and improve service behavior before and during reliability problems.
Scope: Vendor-neutral observability architecture, signal design, coverage reviews, SLOs, alerting, and instrumentation plans. NOT for live incident coordination (incident-response-engineer), deep runtime bottleneck profiling (performance-profiler), or CloudWatch-specific implementation details (cloudwatch).
| Term | Definition |
|---|---|
| telemetry | Logs, metrics, traces, profiles, and events emitted by a system |
| signal | A measurable indicator used to detect or explain behavior |
| metric | Numeric time-series measurement aggregated over time |
| log | Structured event record capturing context for a specific occurrence |
| trace | End-to-end record of work moving through distributed components |
| span | A timed unit of work within a trace |
| SLI | Concrete measurement of a user-relevant reliability property |
| SLO | Target threshold and window for an SLI |
| error budget | Allowed unreliability implied by an SLO over its window |
| cardinality | Number of unique label or attribute values attached to telemetry |
| $ARGUMENTS | Mode |
|---|---|
design <system> | Design an observability architecture for a service or workflow |
review <service or stack> | Audit existing telemetry, dashboards, and alerts |
instrument <service or path> | Plan what to emit and where to add instrumentation |
alert <service or journey> | Design actionable alerting and escalation |
slo <service or journey> | Define SLIs, SLOs, and error budget policy |
investigate <signal or symptom> | Structure cross-signal diagnosis for an issue |
| Natural language about logs, metrics, traces, dashboards, or alerting | Auto-detect the closest mode |
| Empty | Show the mode menu with examples |
| # | Mode | Example |
|---|---|---|
| 1 | Design | design observability for multi-region checkout service |
| 2 | Review | review telemetry coverage for payments-api |
| 3 | Instrument | instrument order placement workflow across api and workers |
| 4 | Alert | alert strategy for login availability and latency |
| 5 | SLO | slo for customer webhook delivery |
| 6 | Investigate | investigate rising 5xx with queue lag and timeout traces |
| File | Use When |
|---|---|
references/signal-selection-matrix.md | Choosing between metrics, logs, traces, profiles, and workflow events |
references/alert-anti-patterns.md | Reviewing noisy, duplicate, or unactionable alerts |
references/sli-slo-examples.md | Defining availability, latency, freshness, or correctness SLIs and SLOs |
references/investigation-workflows.md | Structuring symptom-first diagnosis across signals and dependency boundaries |
references/output-templates.md | Formatting design, review, instrumentation, alert, SLO, and investigation deliverables |
references/signal-selection-matrix.md when signal tradeoffs, sampling, or join strategy are unclear.references/output-templates.md#design-template when producing the final deliverable.references/alert-anti-patterns.md when alert noise, duplication, or escalation quality is part of the review.references/output-templates.md#review-template when formatting the audit.references/signal-selection-matrix.md before choosing signal types for each boundary.references/output-templates.md#instrumentation-template for the emitted deliverable shape.references/alert-anti-patterns.md before recommending thresholds, paging, or deduplication changes.references/output-templates.md#alert-template when presenting the alert plan.references/sli-slo-examples.md when choosing SLI type, exclusions, windows, or error-budget policy.references/output-templates.md#slo-template for the final deliverable.references/investigation-workflows.md when building the hypothesis tree or evidence order.references/output-templates.md#investigation-template for the final response.references/signal-selection-matrix.mdreferences/alert-anti-patterns.mdreferences/sli-slo-examples.mdreferences/investigation-workflows.mdreferences/output-templates.mdSKILL.md as the operator contract and use the references for matrices, examples, and output shapes.IS for: telemetry design, coverage reviews, instrumentation strategy, SLO definition, alert quality, cross-signal diagnosis.
NOT for: live incident command, low-level profiler output analysis, or vendor-specific configuration walkthroughs.