From backend
Prometheus instrumentation discipline: right metric type, right name, right labels. Invoke whenever task involves any interaction with Prometheus metrics — instrumenting application code, writing PromQL queries, defining alerting or recording rules, choosing metric types, managing label cardinality, building exporters, or reviewing monitoring configuration.
npx claudepluginhub xobotyi/cc-foundry --plugin backendThis skill uses the workspace's default tool permissions.
Choose the right metric type, name it clearly, label it sparingly. Prometheus is a pull-based monitoring system built on
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Choose the right metric type, name it clearly, label it sparingly. Prometheus is a pull-based monitoring system built on a dimensional data model — every metric is a time series identified by a name and key-value label pairs. Getting this right at instrumentation time prevents expensive rework later.
${CLAUDE_SKILL_DIR}/references/metric-types.md] Extended type comparison, histogram bucket
tuning, summary configuration${CLAUDE_SKILL_DIR}/references/naming.md] Full naming examples, base units table, character rules,
label best practices${CLAUDE_SKILL_DIR}/references/instrumentation.md] Code patterns per system type, library
instrumentation, performance tuning${CLAUDE_SKILL_DIR}/references/promql.md] Full operator catalog, vector matching, over-time
aggregation, operator precedence${CLAUDE_SKILL_DIR}/references/alerting-and-rules.md] Alert design, recording rule naming,
aggregation patterns, anti-patterns${CLAUDE_SKILL_DIR}/references/exporters.md] Exporter architecture, collectors, help strings,
push-based sourcesChoose correctly at instrumentation time — changing later requires migration of dashboards, alerts, and recording rules.
| Question | Answer | Type |
|---|---|---|
| Can the value decrease? | No | Counter |
| Is it a snapshot of current state? | Yes | Gauge |
| Observing a distribution needing cross-instance aggregation? | Yes | Histogram |
| Need accurate quantiles from a single instance, known at instrumentation time? | Yes | Summary |
| None of the above | — | Gauge |
Monotonically increasing value — resets to zero only on restart.
inc(), inc(v) where v >= 0_total: http_requests_totalrate() or increase() in queries — raw values are meaninglessrate()Value that goes up and down arbitrarily.
inc(), dec(), set(v), set_to_current_time()_total suffixrate() to a gauge — use deriv() or delta()myapp_last_success_timestamp_seconds; compute elapsed time with
time() - metric in PromQLmyapp_build_info{version="1.2.3", commit="abc"} 1 — metadata as labels with constant value
1Samples observations into configurable buckets. Produces _bucket{le="..."}, _sum, _count.
observe(v)le="0.5" includes all observations <= 0.5+Inf bucket (equal to _count)histogram_quantile() in PromQL to calculate percentilesCalculates streaming quantiles on the client side. Produces {quantile="..."}, _sum, _count.
avg(x{quantile="0.95"}) is statistically invalid_sum and _count without quantiles is a valid and useful configurationUse summary over histogram only when ALL of these are true:
Default choice: histogram. See ${CLAUDE_SKILL_DIR}/references/metric-types.md for detailed comparison.
Format: <namespace>_<subsystem>_<name>_<unit>_<suffix>. Not all parts required — minimum is namespace + meaningful
name + unit/suffix.
snake_case — lowercase with underscores, matching [a-zA-Z_:][a-zA-Z0-9_:]*:) are reserved for recording rules — never use in direct instrumentation__) is reserved for Prometheus internalshttp_request_duration_seconds_total, counter-with-unit as _<unit>_total (e.g., process_cpu_seconds_total)_info, timestamps with _timestamp_secondssum() or avg() across all
dimensions should be meaningful. If nonsensical, split into separate metrics.See ${CLAUDE_SKILL_DIR}/references/naming.md for base units table, component ordering, and full examples.
Use labels for dimensions you will filter or aggregate by: http_requests_total{method="GET", status="200"} — not
separate metrics per status. Do not put label names in metric names.
Every unique label combination is a new time series. Each costs RAM, CPU, disk, and network. Cardinality math: total series = metric cardinality x number of targets.
< 10: Safe for most metrics10-100: Acceptable, monitor growth100-1000: Investigate alternatives> 1000: Move analysis out of Prometheussum().Key metrics: request rate (_total), error rate, latency (histogram), in-progress (gauge).
Key metrics per stage: items in (_total), items out (_total), in progress (gauge), last processed timestamp (gauge),
processing duration (histogram).
Key metrics (push to Pushgateway): last success timestamp (gauge), last completion timestamp (gauge), duration (gauge — single run, not distribution).
Instrument transparently — users get metrics without configuration. Minimum for external resource access: request count (counter), error count (counter), latency (histogram).
log_messages_total{level="..."} counter per log levelcache_requests_total{result="hit|miss"}, evictions (counter), size (gauge), lookup latency (histogram).
Also instrument the downstream system.See ${CLAUDE_SKILL_DIR}/references/instrumentation.md for threadpool patterns, custom collectors, and performance
tuning in hot paths.
rate(counter[5m]) — per-second rate. Use for alerts and dashboards.increase(counter[5m]) — total increase. Sugar for rate() * range_seconds.irate(counter[5m]) — instant rate from last two samples. Only for graphing volatile counters.rate() first, then aggregate: sum(rate(x[5m])), never rate(sum(x)[5m]). Rate must see individual counter
resets.rate() a gauge — use deriv() or delta().histogram_quantile(0.95, rate(metric_bucket[5m])) — single histogramle in the by clause:
histogram_quantile(0.95, sum by (job, le) (rate(metric_bucket[5m])))rate(metric_sum[5m]) / rate(metric_count[5m])rate() needs at least 2 samples. Range should be at least 4x scrape interval. With 15s scrape,
use rate(x[5m]) or wider.See ${CLAUDE_SKILL_DIR}/references/promql.md for data types, selectors, aggregation operators, vector matching,
over-time aggregation, binary operators, and operator precedence.
Alert on symptoms (user-visible impact), not causes. Use dashboards to pinpoint causes after an alert fires.
Only page on latency at one point in the stack — if overall user latency is fine, don't page on a slow sub-component. Avoid noisy alerts — if an alert fires and there's nothing to do, remove it.
See ${CLAUDE_SKILL_DIR}/references/alerting-and-rules.md for alert design, naming conventions, and recording rule
details.
Pre-compute frequently used or expensive expressions. Format: level:metric:operations.
without for aggregation — preserves all labels except those being removed. Prefer over by.See ${CLAUDE_SKILL_DIR}/references/alerting-and-rules.md for full naming convention and recording rule anti-patterns.
Write an exporter when the target system does not expose Prometheus metrics natively. For your own code, use a client library directly.
haproxy_up, mysql_global_status_threads_connectedrate() handle the restSee ${CLAUDE_SKILL_DIR}/references/exporters.md for architecture, collectors, help strings, label rules, and
push-based sources.
When writing Prometheus instrumentation:
When writing PromQL queries:
rate() or increase() before further operations.without over by for aggregation.When writing alerting or recording rules:
level:metric:operations naming.When reviewing Prometheus code:
The coding skill governs workflow; this skill governs Prometheus implementation choices.