From devops-skills
Validates syntax, semantics, best practices in PromQL queries and alerting rules; detects anti-patterns, suggests optimizations, and explains queries.
npx claudepluginhub akin-ozer/cc-devops-skills --plugin devops-skillsThis skill uses the workspace's default tool permissions.
This skill performs multi-level validation and provides interactive query planning:
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
This skill performs multi-level validation and provides interactive query planning:
When a user provides a PromQL query, follow this workflow:
Run validation commands from the repository root so relative paths resolve correctly:
cd "$(git rev-parse --show-toplevel)"
If running from another location, use absolute paths to scripts/ files.
Run the syntax validation script to check for basic correctness:
python3 devops-skills-plugin/skills/promql-validator/scripts/validate_syntax.py "<query>"
Output parsing notes:
0: syntax validThe script will check for:
Run the best practices checker to detect anti-patterns and optimization opportunities:
python3 devops-skills-plugin/skills/promql-validator/scripts/check_best_practices.py "<query>"
Output parsing notes:
manual-reviewThe script will identify:
Parse and explain what the query does in plain English:
Required Output Details (always include these explicitly):
**Output Labels**: [list labels that will be in the result, or "None (fully aggregated to scalar)"]
**Expected Result Structure**: [instant vector / range vector / scalar] with [N series / single value]
Example:
**Output Labels**: job, instance
**Expected Result Structure**: Instant vector with one series per job/instance combination
When citing examples/docs in recommendations, include file path + 1-based line numbers:
examples/good_queries.promql:42
docs/best_practices.md:88
Rules:
line number unavailable and provide file pathAsk the user clarifying questions to verify the query matches their intent:
Understand the Goal: "What are you trying to monitor or measure?"
Verify Metric Type: "Is this a counter (always increasing), gauge (can go up/down), histogram, or summary?"
Clarify Time Range: "What time window do you need?"
Confirm Aggregation: "Do you need to aggregate data across labels? If so, which labels?"
Check Output Intent: "Are you using this for alerting, dashboarding, or ad-hoc analysis?"
IMPORTANT: Two-Phase Dialogue
After presenting Steps 1-4 results (Syntax, Best Practices, Query Explanation, and Intent Questions):
⏸️ STOP HERE AND WAIT FOR USER RESPONSE
Do NOT proceed to Steps 5-7 until the user answers the clarifying questions. This ensures the subsequent recommendations are tailored to the user's actual intent.
Only proceed to this step after the user has answered the clarifying questions from Step 4.
After understanding the user's intent:
When relevant, mention known limitations:
_bytes suffix. Please confirm if this is correct.")Based on validation results:
Reference Examples: When suggesting corrections, cite relevant examples using this format:
As shown in `examples/bad_queries.promql` (lines 91-97):
❌ BAD: `avg(http_request_duration_seconds{quantile="0.95"})`
✅ GOOD: Use histogram_quantile() with histogram buckets
Citation sources:
examples/good_queries.promql - for well-formed patternsexamples/optimization_examples.promql - for before/after comparisonsexamples/bad_queries.promql - for showing what to avoiddocs/best_practices.md - for detailed explanationsdocs/anti_patterns.md - for anti-pattern deep divesCitation Format: file_path (lines X-Y) with the relevant code snippet quoted
Give the user control:
[a-zA-Z_:][a-zA-Z0-9_:]* or use UTF-8 quoting syntax (Prometheus 3.0+):
{"my.metric.with.dots"}{__name__="my.metric.with.dots"}= (equal), != (not equal), =~ (regex match), !~ (regex not match)[0-9]+(ms|s|m|h|d|w|y) - e.g., 5m, 1h, 7dmetric_name[duration] - e.g., http_requests_total[5m]offset <duration> - e.g., metric_name offset 5m@ <timestamp> or @ start() / @ end()_total, _count, _sum, or _bucket)rate() or increase(), not raw valuesrate() or increase()histogram_quantile() with le label and rate() on _bucket metrics_sum and _countby() or without() to control output labels= instead of =~ when possible for exact matches[2m] minimum)rate() for longer periods❌ Bad: http_requests_total{}
✅ Good: http_requests_total{job="api", instance="prod-1"}
❌ Bad: http_requests_total{status=~"2.."}
✅ Good: http_requests_total{status="200"}
❌ Bad: http_requests_total
✅ Good: rate(http_requests_total[5m])
❌ Bad: rate(memory_usage_bytes[5m])
✅ Good: memory_usage_bytes
avg_over_time()❌ Bad: avg(http_request_duration_seconds{quantile="0.95"})
✅ Good: histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
❌ Bad: rate(metric[5m])[90d:1m]
✅ Good: Use recording rules or limit range to necessary duration
❌ Bad: irate(metric[1h])
✅ Good: rate(metric[1h]) or irate(metric[5m])
❌ Bad: avg(http_request_duration_seconds{quantile="0.95"}) / rate(node_memory_usage_bytes[1h]) + sum(http_requests_total)
✅ Good: Keep each metric type in separate, purpose-specific queries:
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))node_memory_usage_bytes{instance="prod-1"}rate(http_requests_total{job="api"}[5m])Provide validation results in this structure:
## PromQL Validation Results
### Syntax Check
- Status: ✅ VALID / ⚠️ WARNING / ❌ ERROR
- Issues: [list any syntax errors with line/position]
### Semantic Check
- Status: ✅ VALID / ⚠️ WARNING / ❌ ERROR
- Issues: [list any logical problems]
### Performance Analysis
- Status: ✅ OPTIMIZED / ⚠️ CAN BE IMPROVED / ❌ INEFFICIENT
- Issues: [list optimization opportunities]
- Suggestions: [specific improvements]
### Query Explanation
Your query: `<query>`
This query does:
- [Plain English explanation]
- Metrics: [list metrics and their types]
- Functions: [explain each function]
- Output: [describe result structure]
### Intent Verification
Let me verify this matches your needs:
1. What are you trying to measure? [your goal here]
2. Is this a counter/gauge/histogram/summary? [metric type]
3. What time range interests you? [time window]
4. Do you need aggregation? If so, by which labels? [aggregation needs]
5. Is this for alerting, dashboarding, or analysis? [use case]
### Recommendations
[Based on the analysis, suggest improvements or alternatives]
After validation, engage in dialogue:
Claude: "I've validated your query. It's syntactically correct, but I notice it queries http_requests_total without any label filters. This could match thousands of time series. What specific service or endpoint are you trying to monitor?"
User: [provides intent]
Claude: "Great! Based on that, here's an optimized version: rate(http_requests_total{job="api-service", path="/users"}[5m]). This calculates the per-second rate of requests to the /users endpoint over the last 5 minutes. Does this match what you need?"
User: [confirms or asks for changes]
Claude: [provides refined query or alternatives]
See the examples/ directory for:
good_queries.promql: Well-written queries following best practicesbad_queries.promql: Common mistakes and anti-patterns (with corrections)optimization_examples.promql: Before/after optimization examplesSee the docs/ directory for:
best_practices.md: Comprehensive PromQL best practices guideanti_patterns.md: Detailed anti-pattern reference with explanationsThis skill can be used:
The skill uses two main Python scripts:
Both scripts output JSON for programmatic parsing and human-readable messages for display.
A successful validation session should:
The validation scripts have some limitations to be aware of:
_total, _bytes)job:http_requests:rate5m) are valid without label filtersThe scripts detect common anti-patterns but cannot catch:
The goal is not just to validate queries, but to help users write better PromQL and understand their monitoring data. Always be educational, interactive, and helpful!