Help us improve
Share bugs, ideas, or general feedback.
From grafana-app-sdk
Monitors Grafana Cloud costs, sets usage alerts, attributes spending by label, and reduces cardinality with Adaptive Metrics/Logs. Use for observability budget analysis and optimization.
npx claudepluginhub grafana/skills --plugin grafana-coreHow this skill is triggered — by the user, by Claude, or both
Slash command
/grafana-app-sdk:cost-managementThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Docs**: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Analyzes Prometheus metrics usage and generates aggregation rules to reduce Active Series count and lower Grafana Cloud costs.
Queries Prometheus and Loki billing metrics via Grafana API for active series, ingestion rates, storage usage, cardinality, and observability costs.
Automates test-driven Grafana Cloud observability setup: SLOs, alerting, synthetic monitoring, k6 load testing, IRM on-call, dashboards, cost optimization, GitOps export.
Share bugs, ideas, or general feedback.
Docs: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Access: My Account → Cost Management (or within your Grafana Cloud stack)
FOCUS-compliant (FinOps Open Cost and Usage Specification) billing dashboards showing:
Tag your telemetry at ingestion to enable per-team cost reporting:
// Add cost attribution labels in Alloy
prometheus.remote_write "cloud" {
endpoint {
url = sys.env("PROMETHEUS_URL")
basic_auth {
username = sys.env("PROM_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
env = "production",
}
}
loki.write "cloud" {
endpoint {
url = sys.env("LOKI_URL")
basic_auth {
username = sys.env("LOKI_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
}
}
Set alerts before you hit quota or budget thresholds:
# Alert when approaching metrics quota
groups:
- name: grafana-cloud-usage
rules:
- alert: MetricsUsageHigh
expr: grafana_cloud_metrics_active_series / grafana_cloud_metrics_limit > 0.8
for: 1h
labels:
severity: warning
annotations:
summary: "Grafana Cloud metrics usage >80% of quota"
- alert: LogsIngestionHigh
expr: increase(grafana_cloud_logs_bytes_ingested_total[24h]) > 50e9 # 50GB/day
labels:
severity: warning
annotations:
summary: "Grafana Cloud log ingestion >50GB today"
Automatically identifies unused or high-cardinality metrics and generates aggregation rules.
# View recommendations
curl https://yourstack.grafana.net/api/plugins/grafana-adaptive-metrics-app/resources/v1/recommendations \
-H "Authorization: Bearer <token>"
# Apply aggregation rule — drops high-cardinality labels from a metric
- match: "^http_request_duration_seconds.*"
action: keep
match_labels:
- method
- status_code
- service
# Drops: pod, container, instance, node — reduces series from 10k → 50
Workflow:
Drop or sample log lines before ingestion using Loki's pipeline stages in Alloy:
loki.process "filter_logs" {
forward_to = [loki.write.cloud.receiver]
// Drop health check logs (high volume, low value)
stage.drop {
expression = ".*GET /health.*"
}
// Drop debug logs in production
stage.drop {
source = "level"
expression = "debug"
}
// Sample verbose info logs (keep 10%)
stage.sampling {
rate = 0.1
source = "level"
value = "info"
}
}
Use Alloy tail-based sampling to keep only important traces:
otelcol.processor.tail_sampling "cost_control" {
decision_wait = "10s"
policy {
name = "keep-errors"
type = "status_code"
status_code { status_codes = ["ERROR"] }
}
policy {
name = "keep-slow"
type = "latency"
latency { threshold_ms = 1000 }
}
policy {
name = "sample-rest"
type = "probabilistic"
probabilistic { sampling_percentage = 5 }
}
output {
traces = [otelcol.exporter.otlp.cloud.input]
}
}
# Active metric series (billed unit for metrics)
grafana_cloud_metrics_active_series
# Series by label (find high-cardinality sources)
topk(20, count by (__name__) ({__name__=~".+"}))
# Log bytes ingested per stream
sum(increase(loki_ingester_chunk_size_bytes_sum[24h])) by (namespace, app)
# Trace spans ingested
rate(tempo_distributor_spans_received_total[5m])
topk(20, count by (__name__))team, project) to all Alloy configs| Signal | Billing Unit |
|---|---|
| Metrics | Active series (unique label combinations) |
| Logs | Bytes ingested |
| Traces | Spans ingested |
| Profiles | Bytes ingested |
| Synthetic Monitoring | Check executions |
| k6 | VUh (Virtual User hours) |