From devops-skills
Generates LogQL queries, stream selectors, metric queries, and alerting rules for Grafana Loki via interactive workflow handling versions, labels, and use cases like debugging or dashboards.
npx claudepluginhub akin-ozer/cc-devops-skills --plugin devops-skillsThis skill uses the workspace's default tool permissions.
Interactive workflow for generating production-ready LogQL queries. LogQL is Grafana Loki's query language with indexed label selection, line filtering, parsing, and metric aggregation.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Interactive workflow for generating production-ready LogQL queries. LogQL is Grafana Loki's query language with indexed label selection, line filtering, parsing, and metric aggregation.
Use this skill for query generation, dashboard queries, alerting expressions, and troubleshooting with Loki logs.
Always run stages in order. Do not skip required stages.
Use AskUserQuestion to collect goal and use case.
Template:
Fallback if AskUserQuestion is unavailable:
Collect:
job, namespace, app, service_name, cluster)status, level, duration, path, trace_id)Ambiguity and partial-answer handling:
Assumptions: in the output so the user can correct them quickly.Collect or infer:
2.9.x, 3.0+, unknown)10.x, 11.x, unknown)Version compatibility policy:
Avoid by default when version is unknown:
|> and !>approx_topkdetected_level, accelerated metadata filtering assumptions)Present a plain-English plan, then ask the user to choose output mode.
Plan template:
LogQL Query Plan
Goal: <goal>
Query type: <log or metric>
Streams: <selector>
Filters/parsing: <filters + parser>
Aggregation window: <function and [range]>
Compatibility mode: <version-aware or compatibility-first>
Mode selection template:
final query only (default) or incremental build (step-by-step)?"If user does not choose, default to final query only.
Complex query triggers:
topk(sum by(...)), multiple sum by, percentiles)line_format, label_format)Blocking checkpoint rule:
examples/common_queries.logql for syntax and query patternsreferences/best_practices.md for performance and alerting guidanceFallback when file-read tools are unavailable:
Unverified against local references.Use external lookup only for version-specific behavior, unclear syntax, or advanced features not covered in local references.
Decision order:
mcp__context7__resolve-library-id with libraryName="grafana loki"mcp__context7__query-docs for the exact topicWebSearch fallback constraints:
Return one production-ready query plus short explanation.
Use this when requested or when debugging complex pipelines.
Step-by-step template:
Always include:
logcli)<field>. Should I assume <default> so I can continue?"{job="app"} |= "error" |= "timeout"
{job="app"} |~ "error|fatal|critical"
{job="app"} != "debug"
{app="api"} | json | level="error" | status_code >= 500
{app="api"} | logfmt | caller="database.go"
{job="nginx"} | pattern "<ip> - - [<_>] \"<method> <path>\" <status> <size>"
rate({job="app"} | json | level="error" [5m])
sum by (app) (count_over_time({namespace="prod"} | json [5m]))
sum(rate({app="api"} | json | level="error" [5m])) / sum(rate({app="api"}[5m])) * 100
quantile_over_time(0.95, {app="api"} | json | unwrap duration [5m])
topk(10, sum by (error_type) (count_over_time({job="app"} | json | level="error" [1h])))
{job="app"} | json | line_format "{{.level}}: {{.message}}"
{job="app"} | json | label_format env=`{{.environment}}`
{job="nginx"} | logfmt | remote_addr = ip("192.168.4.0/24")
pattern > logfmt > json > regexp.{app="api"} | json | regexp "user_(?P<user_id>\\d+)"
sum(sum_over_time({app="api"} | json | unwrap duration [5m]))
{service_name=`app`} |> "<_> level=debug <_>"
{app="api"} | json | (status_code >= 400 and status_code < 500) or level="error"
sum(rate({app="api"} | json | level="error" [5m])) - sum(rate({app="api"} | json | level="error" [5m] offset 1d))
{app="api"} | json | keep namespace, pod, level
{app="api"} | json | drop pod, instance
Note: LogQL has no
dedupordistinctoperators. Use metric aggregations likesum by (field)for programmatic deduplication.
High-cardinality data without indexing (trace_id, user_id, request_id):
# Filter AFTER stream selector, NOT in it
{app="api"} | trace_id="abc123" | json | level="error"
Place structured metadata filters BEFORE parsers:
# ACCELERATED
{cluster="prod"} | detected_level="error" | logfmt | json
# NOT ACCELERATED
{cluster="prod"} | logfmt | json | detected_level="error"
approx_topk(10, sum by (endpoint) (rate({app="api"}[5m])))
sum(count_over_time({app="api"} | json | level="error" [5m])) or vector(0)
discover_log_levels: true (stored as structured metadata)| Function | Description |
|---|---|
rate(log-range) | Entries per second |
count_over_time(log-range) | Count entries |
bytes_rate(log-range) | Bytes per second |
bytes_over_time(log-range) | Total bytes in time range |
absent_over_time(log-range) | Returns 1 if no logs |
Rule:
bytes_over_time(<log-range>) for raw log-byte volume.| unwrap bytes(field) with unwrapped range aggregations for numeric byte fields extracted from log content.| Function | Description |
|---|---|
sum_over_time, avg_over_time, max_over_time, min_over_time | Aggregate numeric values |
quantile_over_time(φ, range) | φ-quantile (0 ≤ φ ≤ 1) |
first_over_time, last_over_time | First/last value in interval |
stddev_over_time | Population standard deviation of unwrapped values |
stdvar_over_time | Population variance of unwrapped values |
rate_counter | Per-second rate treating values as a monotonically increasing counter |
sum, avg, min, max, count, stddev, topk, bottomk, approx_topk, sort, sort_desc
With grouping: sum by (label1, label2) or sum without (label1)
| Function | Description |
|---|---|
duration_seconds(label) | Convert duration string |
bytes(label) | Convert byte string (KB, MB) |
label_replace(rate({job="api"} |= "err" [1m]), "foo", "$1", "service", "(.*):.*")
| logfmt [--strict] [--keep-empty]
--strict: Error on malformed entries--keep-empty: Keep standalone keys| json # All fields
| json method="request.method", status="response.status" # Specific fields
| json servers[0], headers="request.headers[\"User-Agent\"]" # Nested/array
| pattern "<ip> - - [<timestamp>] \"<method> <path> <_>\" <status> <size>"
Named placeholders become extracted labels; <_> discards a field.
| regexp "(?P<level>\\w+): (?P<message>.+)"
Uses named capture groups (?P<name>). Slower than pattern/logfmt/json.
| decolorize
Strips ANSI color escape codes. Apply before parsing when logs come from terminal output.
| unpack
Unpacks log entries that were packed by Promtail's pack pipeline stage. Restores the original log line and any embedded labels.
Common functions for line_format and label_format:
String: trim, upper, lower, replace, trunc, substr, printf, contains, hasPrefix
Math: add, sub, mul, div, addf, subf, floor, ceil, round
Date: date, now, unixEpoch, toDate, duration_seconds
Regex: regexReplaceAll, count
Other: fromJson, default, int, float64, __line__, __timestamp__
See examples/common_queries.logql for detailed usage.
# Alert when error rate exceeds 5%
(sum(rate({app="api"} | json | level="error" [5m])) / sum(rate({app="api"}[5m]))) > 0.05
# With vector() to avoid "no data"
sum(rate({app="api"} | json | level="error" [5m])) or vector(0) > 10
| Issue | Solution |
|---|---|
| No results | Check labels exist, verify time range, test stream selector alone |
| Query slow | Use specific selectors, filter before parsing, reduce time range |
| Parse errors | Verify log format matches parser, test JSON validity |
| High cardinality | Use line filters not label filters for unique values, aggregate |
Use Stage 6 policy. Trigger external docs for:
| Trigger | Topic to Search | Tool to Use |
|---|---|---|
| User mentions Loki 3.x features | structured metadata, bloom filters, detected_level | Context7 first |
approx_topk function needed | approx_topk probabilistic | Context7 first |
Pattern match operators (|>, !>) | pattern match operator | Context7 first |
vector() function for alerting | vector function alerting | Context7 first |
| Recording rules configuration | recording rules loki | Context7 first |
| Unclear syntax or edge cases | Specific function/operator | Context7 first |
| Version-specific behavior questions | Version + feature | WebSearch fallback |
| Grafana Alloy integration | grafana alloy loki | WebSearch fallback |
examples/common_queries.logql: Query patterns, template function examplesreferences/best_practices.md: Optimization, anti-patterns, alerting guidancejson).final query only).sum by (service) (rate({namespace="prod", app="api"} | json | status_code >= 500 [15m]))
{app="auth"}
{app="auth"} |= "login failed"
{app="auth"} |= "login failed" | json
sum(count_over_time({app="auth"} |= "login failed" | json [5m]))
Mark task done only when all checks pass:
final query only).|>, !>)approx_topk functionDeprecations: Promtail (use Alloy), BoltDB store (use TSDB with v13 schema)