From rootly
Manages Rootly incidents: create, search, triage, update, resolve using MCP tools. Covers lifecycle, severity/status, AI analysis (find_related_incidents, suggest_solutions), alerts, action items.
npx claudepluginhub wyre-technology/msp-claude-plugins --plugin rootlyThis skill uses the workspace's default tool permissions.
Incidents are the primary resource in Rootly — the central record for any production issue, outage, or service degradation. Rootly is used by SRE and platform engineering teams to coordinate response in real time and drive continuous improvement via postmortems. In MSP environments, Rootly typically manages internal infrastructure incidents rather than per-client ticketing, but can be configure...
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Incidents are the primary resource in Rootly — the central record for any production issue, outage, or service degradation. Rootly is used by SRE and platform engineering teams to coordinate response in real time and drive continuous improvement via postmortems. In MSP environments, Rootly typically manages internal infrastructure incidents rather than per-client ticketing, but can be configured with team-based routing to support multi-customer workflows.
The incident system supports:
find_related_incidents and suggest_solutions surface past patterns and recommendationsAll read and write operations on incidents are available through the Rootly MCP tools.
| Tool | Description | Key Parameters |
|---|---|---|
incidents_get | List and search incidents | status, severity, page[number], page[size] |
incidents_post | Create a new incident | title, severity_id, team_ids, service_ids |
incidents_by_incident_id_alerts_post | Attach an alert to an incident | incident_id, alert data |
incidents_by_incident_id_alerts_get | List alerts attached to an incident | incident_id |
incidents_by_incident_id_action_items_post | Create an action item on an incident | incident_id, summary, assignee_id |
incidents_by_incident_id_action_items_get | List action items on an incident | incident_id |
| Tool | Description | Key Parameters |
|---|---|---|
find_related_incidents | Find historically similar incidents using TF-IDF | query or incident_id |
suggest_solutions | Suggest remediation steps based on past resolutions | incident_id or description |
| Tool | Description |
|---|---|
severities_get | List configured severity levels (ID, slug, color) |
services_get | List services to scope incident to the right owner |
teams_get | List teams for incident assignment and routing |
incident_types_get | List incident types (bug, outage, performance, etc.) |
environments_get | List environments (production, staging, etc.) |
users_get | List users for assignment |
Call list_endpoints to get the current list of all available API endpoints and tool names from the Rootly OpenAPI specification.
┌───────────┐ Triage starts ┌────────────┐ Contained ┌───────────┐
│ detected │ ─────────────────> │ in_triage │ ────────────> │ mitigated │
└───────────┘ └────────────┘ └───────────┘
│
Full fix done
▼
┌──────────┐
│ resolved │
└──────────┘
│
PIR complete
▼
┌────────┐
│ closed │
└────────┘
Key lifecycle timestamps on the incident record:
| Timestamp | Meaning |
|---|---|
detected_at | Alert fired or issue first observed |
acknowledged_at | Responder acknowledged the page |
in_triage_at | Active investigation started |
started_at | Response team coordinating |
mitigated_at | Immediate impact contained |
resolved_at | Issue fully resolved |
closed_at | Post-incident review complete |
cancelled_at | False alarm; incident cancelled |
Severities are configurable per Rootly organization. Common conventions:
| Severity | Typical Name | Description | SLA Target |
|---|---|---|---|
| SEV-1 / Critical | P1 | Complete outage or data loss; business-critical impact | Immediate (15 min) |
| SEV-2 / High | P2 | Major feature degraded; significant user impact | 30 minutes |
| SEV-3 / Medium | P3 | Partial degradation; workaround available | 2 hours |
| SEV-4 / Low | P4 | Minor issue; minimal user impact | Next business day |
Note: Severity IDs are UUIDs in Rootly. Always call
severities_getto map severity slugs to IDs before creating an incident.
Rootly uses free-text status descriptors alongside timestamps. Common values:
detected — Newly created, not yet acknowledgedin_triage — Actively being investigatedmitigated — Impact contained, monitoring for recurrenceresolved — Issue fixed, normal operation restoredcancelled — False alarm or invalid incident| Field | Type | Description |
|---|---|---|
id | string | UUID of the incident |
sequential_id | integer | Human-readable incident number (e.g., INC-342) |
title | string | Short summary of the incident |
summary | string | Detailed description of impact and current status |
status | string | Current lifecycle status |
severity | object | Severity record with slug, color, description |
services | array | Affected services |
environments | array | Affected environments (production, staging) |
teams | array | Teams assigned to respond |
labels | array | Custom tags for filtering and reporting |
started_at | datetime | When the incident started |
resolved_at | datetime | When the incident was resolved (null if open) |
slack_channel_id | string | Auto-created Slack channel for coordination |
url | string | Web UI link to the incident |
incidents_get with status=in_triage or status=detected, sorted by severityfind_related_incidents with the incident_id to surface similar past incidentssuggest_solutions with the incident_id to get AI-generated remediation suggestionsincidents_by_incident_id_action_items_post for each remediation stepseverities_get to find the correct severity ID for the impact levelservices_get to identify which service is affectedteams_get to find the on-call team to assignincidents_post with title, severity_id, service_ids, team_idsincidents_by_incident_id_alerts_postincidents_get or by sequential_idfind_related_incidents — look for recurring patterns (same service, same time of day, same symptoms)suggest_solutions — surface past resolutions that worked for similar incidentsincidents_by_incident_id_action_items_getincidents_by_incident_id_alerts_getincidents_get with a 24-hour window and status=resolveddetected_at vs resolved_at)Rootly incidents often correspond to PSA service tickets for MSP billing:
sequential_id (e.g., INC-342) and url in the PSA ticket bodysummaryBefore handing off to the next on-call engineer:
get_oncall_handoff_summary to see current and next on-call status plus open incidentsincidents_get with status=in_triage to review any actively open incidents| Error | HTTP Code | Resolution |
|---|---|---|
| Invalid API token | 401 | Regenerate at Account > Manage API Keys |
| Insufficient permissions | 403 | Token may be Team-scoped; use an Account or Global token |
| Incident not found | 404 | Verify incident_id; use incidents_get to list active incidents |
| Severity ID invalid | 422 | Call severities_get to get valid IDs before creating |
| Rate limited | 429 | Back off 30 seconds; retry with exponential backoff |
401 Unauthorized
Verify your Rootly API token:
- Generate at Account > Manage API Keys
- Token type: Global (full access) or Team (limited to team resources)
- Pass as: Authorization: Bearer <token>
find_related_incidents and suggest_solutions before manual investigationteam_ids when creating incidents so on-call routing works correctlyseverities_get to confirm slug-to-ID mapping rather than hardcodingINC-{sequential_id} in PSA tickets and Slackcheck_oncall_health_risk before major deployments or planned maintenance