From gcx
Analyzes SLO performance trends over 28 days using gcx, computes SLI statistics like mean and std dev, and recommends data-backed improvements for objectives, alerting, labels, or windows.
npx claudepluginhub grafana/gcx --plugin gcxThis skill is limited to using the following tools:
Analyze SLO timeline trends, compute statistics over the past 28 days, and generate advisory
Checks SLO health, budgets, burn rates, and trends using gcx CLI commands. Displays overviews of all SLOs, specific status, timelines, and graphs for Grafana Cloud.
Guides defining SLOs, selecting SLIs, and implementing error budget policies for service reliability, alerting, and balancing velocity.
Provides Prometheus queries and templates for SLO/SLI definitions on availability/latency, error budget calculations, and burn rate alerting for service reliability.
Share bugs, ideas, or general feedback.
Analyze SLO timeline trends, compute statistics over the past 28 days, and generate advisory recommendations backed by real metric values. Never modify SLO definitions directly — route to slo-manage when the user wants to apply a recommendation.
-o json for agent processing of structured output; default format for user display.gcx configured with a context pointing to the target Grafana instance.
If the user does not supply a UUID, list available SLOs first:
gcx slo definitions list
Ask the user which SLO to analyze if the target is ambiguous.
gcx slo definitions get <UUID> -o json
Extract and note:
spec.name — display namespec.objectives[0].value — current objective (e.g., 0.999)spec.objectives[0].window — compliance window (e.g., 28d)spec.query.type — ratio | freeform | thresholdspec.query.ratio.groupByLabels — dimensional labels (may be empty)spec.alerting — fastBurn / slowBurn configuration (may be absent)spec.destinationDatasource.uid — datasource UID for metric queries# Default graph output for user display
gcx slo definitions timeline <UUID> --from now-28d --to now
# JSON output for statistical analysis
gcx slo definitions timeline <UUID> --from now-28d --to now -o json
Parse the JSON output to extract SLI values across the time series. Compute:
mean_sli — average SLI over the 28-day windowmin_sli — lowest observed SLI pointmax_sli — highest observed SLI pointstd_dev — variability indicatorIf timeline returns no data (NODATA), note it and skip to Step 3 for current status.
gcx slo definitions status <UUID> -o wide
Extract from the wide output:
When timeline data is sparse (< 7 days of points) or all NODATA, query raw metrics directly using the datasource UID from Step 1:
# SLI window metric (primary trend signal)
gcx metrics query <datasource-uid> \
'grafana_slo_sli_window{slo_uuid="<UUID>"}' \
--from now-28d --to now --step 6h
# Success and total rate for ratio SLOs
gcx metrics query <datasource-uid> \
'grafana_slo_success_rate_5m{slo_uuid="<UUID>"}' \
--from now-28d --to now --step 6h
gcx metrics query <datasource-uid> \
'grafana_slo_total_rate_5m{slo_uuid="<UUID>"}' \
--from now-28d --to now --step 6h
If the datasource UID is not in the definition, resolve it:
gcx datasources list --type prometheus
Classify the pattern using the timeline data from Steps 2 and 4:
Sustained decline — SLI trending downward for 7 or more consecutive days. Compute the slope over the last 7 days vs. the preceding 7 days to confirm direction.
Periodic dips — SLI drops recur at regular intervals (e.g., every weekend, every night). Look for temporal correlation in the min points.
Sudden drops — Step-change in SLI at a specific timestamp (deployment, config change). Identify the onset timestamp and estimate error budget consumed by the event.
Budget exhaustion rate — Project when the error budget will reach 0 based on the current
burn rate from Step 3. Formula:
days_until_exhausted = budget_remaining_pct / (burn_rate * 100 / window_days)
Produce numbered recommendations. Each recommendation requires:
Objective tuning
If mean_sli < objective - 0.005 (more than 0.5 pp below the objective):
floor(mean_sli * 1000) / 1000 (rounded down to 3 dp).If mean_sli > objective + 0.010 (more than 1 pp above the objective):
mean_sli - 0.005.groupByLabels addition (ratio query type only)
If spec.query.ratio.groupByLabels is empty or absent:
cluster, service, endpoint, or
status_code depending on what labels exist in the underlying metric series.Alerting configuration
If spec.alerting is absent or empty:
burnRateThreshold: 14.4 over 1h (consumes 2% budget/hour),
slowBurn burnRateThreshold: 1 over 6h.If alerting is configured and current burn rate (from Step 3) has been above 2x for the past 7 days (compare burn rate from status with recent timeline values):
Window adjustment
If the SLO window is 7d and periodic dips are detected (weekend pattern):
If the SLO window is 28d or 30d and mean_sli is very stable (std_dev < 0.001):
Present all recommendations as advisory text. Do not apply any changes.
After presenting recommendations, ask:
"Would you like me to apply any of these recommendations? If so, I'll switch to slo-manage to pull the current definition and implement the changes with a dry-run first."
If the user confirms, invoke the slo-manage skill to handle the update workflow.
SLO: <name>
UUID: <uuid>
Objective: <value> over <window>
Analysis period: now-28d to now
SLI Statistics (28d):
Mean: <value> Min: <value> Max: <value>
Std dev: <value>
Current Status:
SLI: <value> Budget remaining: <pct>% Burn rate: <value>x
SLI (1h): <value> SLI (1d): <value>
[28-day timeline graph]
Trend classification: <Sustained decline | Periodic dips | Sudden drops | Stable>
<One sentence describing the dominant pattern with supporting data>
Advisory Recommendations:
1. <Recommendation title>
Current: <value>
Proposed: <value>
Why: <rationale with numbers>
2. <Recommendation title>
...
[If no recommendations apply:]
No objective or alerting changes recommended. The SLO configuration appears well-calibrated
for the observed performance over the past 28 days.
---
To apply a recommendation: slo-manage will pull the definition and apply the change with
a dry-run. Confirm which recommendation(s) you want to apply.
Collect errors; report them at the end of the analysis, not interleaved with findings.
gcx slo definitions get fails (not found): Confirm the UUID and context.
Run gcx slo definitions list to show available SLOs.
Timeline returns NODATA: Recording rule metrics may not be populating. Check the destination datasource configuration. Proceed with raw metric queries in Step 4. If raw metrics also return NODATA, report the data gap and recommend verifying that the SLO recording rules are evaluating correctly.
Datasource UID not in definition: Run gcx datasources list --type prometheus
and present the list to the user. Do not block the analysis — use the remaining timeline
data from Step 2.
Timeline data < 7 days of points: The SLO may be newly created. Note the limited analysis window, proceed with available data, and suppress trend classifications that require 7+ days of data.
Status returns BREACHING: Note the breach in the output. Include budget exhaustion rate in the recommendations. Route to slo-investigate for deeper root cause analysis if the user wants to understand why the SLO is breaching (not just optimize it).
gcx command not found or auth error: Check gcx config view to verify
the active context and credentials.