Decision heuristics for interpreting Honeycomb SLO compliance, budget burn rates, and trigger status — what the numbers mean and what action to take, including detecting misconfigured SLIs, deciding when to freeze deploys vs page on-call, and designing burn alert thresholds. Load this skill before calling get_slos or get_triggers. Trigger phrases: "check our SLOs", "are we meeting our SLOs", "which SLOs are healthy", "is the error budget OK", "are any alerts firing", "what's the burn rate", "set up an SLO", "create a trigger", "configure alerts", "set up burn alerts", "check trigger status", "starting on-call", "reliability picture", "should we freeze deploys", "is this SLO misconfigured", "are we within budget", "SLO is broken", "budget is negative", or any request about service level objectives, error budgets, burn rates, or alerting in Honeycomb.
From honeycombnpx claudepluginhub honeycombio/agent-skill --plugin honeycombThis skill uses the workspace's default tool permissions.
references/alerting-strategy.mdreferences/slo-design-guide.mdreferences/trigger-examples.mdDispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Guidance for configuring and reasoning about reliability in Honeycomb. The get_slos
and get_triggers tools document their own parameters — this skill focuses on
designing effective SLOs, choosing between SLOs and triggers, and interpreting
what the numbers mean.
Availability: SLOs require Pro or Enterprise plan. Triggers available on all plans.
| Question | SLO | Trigger |
|---|---|---|
| "Are we meeting our reliability commitments?" | Yes | No |
| "Is something broken right now?" | No | Yes |
| "How fast are we burning our error budget?" | Yes (burn alerts) | No |
| "Did error count exceed a threshold?" | No | Yes |
| "Should we slow down deploys?" | Yes (budget remaining) | No |
Rule of thumb: SLOs measure reliability against commitments over time. Triggers catch immediate operational issues.
An SLI is a per-event boolean: was this event successful? Implemented as a calculated field returning 1 (success) or 0 (failure).
LTE(duration_ms, 500) — requests faster than 500msLTE(http.status_code, 499) — non-5xx responsesEQUALS(checkout.status, "completed") — successful checkoutsAt minimum, two alerts:
When reviewing SLOs with get_slos:
"50 requests slower than 2s" is more actionable than "P99 is 2100ms."
Use COUNT WHERE duration_ms > threshold instead of P99 triggers.
Share a single error budget across up to 10 services.
${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/slo-design-guide.md — Detailed SLO design methodology, multi-service SLOs, error budget math${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/trigger-examples.md — Complete trigger example library organized by use case${CLAUDE_PLUGIN_ROOT}/skills/slos-and-triggers/references/alerting-strategy.md — How to combine SLO burn alerts and triggers into a cohesive alerting strategy