From grafana-app-sdk
Monitors Grafana Cloud usage and costs, attributes spending by labels/teams, sets quota alerts, manages invoices, optimizes with Adaptive Metrics/Logs for cardinality reduction. Use for budgeting and FinOps.
npx claudepluginhub grafana/skills --plugin grafana-app-sdkThis skill uses the workspace's default tool permissions.
> **Docs**: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Reduces Grafana Cloud Metrics costs by analyzing Prometheus usage and generating Adaptive Metrics aggregation rules to manage cardinality and drop unused labels.
Provides observability patterns for metrics, logging, tracing, alerting, SLOs, dashboards, and infrastructure monitoring using Prometheus, OpenTelemetry, Grafana, Loki, Jaeger.
Automates test-driven Grafana Cloud observability setup: SLOs, alerting, synthetic monitoring, k6 load testing, IRM on-call, dashboards, cost optimization, GitOps export.
Share bugs, ideas, or general feedback.
Docs: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/
Access: My Account → Cost Management (or within your Grafana Cloud stack)
FOCUS-compliant (FinOps Open Cost and Usage Specification) billing dashboards showing:
Tag your telemetry at ingestion to enable per-team cost reporting:
// Add cost attribution labels in Alloy
prometheus.remote_write "cloud" {
endpoint {
url = sys.env("PROMETHEUS_URL")
basic_auth {
username = sys.env("PROM_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
env = "production",
}
}
loki.write "cloud" {
endpoint {
url = sys.env("LOKI_URL")
basic_auth {
username = sys.env("LOKI_USER")
password = sys.env("GRAFANA_CLOUD_API_KEY")
}
}
external_labels = {
team = "platform",
project = "checkout-service",
}
}
Set alerts before you hit quota or budget thresholds:
# Alert when approaching metrics quota
groups:
- name: grafana-cloud-usage
rules:
- alert: MetricsUsageHigh
expr: grafana_cloud_metrics_active_series / grafana_cloud_metrics_limit > 0.8
for: 1h
labels:
severity: warning
annotations:
summary: "Grafana Cloud metrics usage >80% of quota"
- alert: LogsIngestionHigh
expr: increase(grafana_cloud_logs_bytes_ingested_total[24h]) > 50e9 # 50GB/day
labels:
severity: warning
annotations:
summary: "Grafana Cloud log ingestion >50GB today"
Automatically identifies unused or high-cardinality metrics and generates aggregation rules.
# View recommendations
curl https://yourstack.grafana.net/api/plugins/grafana-adaptive-metrics-app/resources/v1/recommendations \
-H "Authorization: Bearer <token>"
# Apply aggregation rule — drops high-cardinality labels from a metric
- match: "^http_request_duration_seconds.*"
action: keep
match_labels:
- method
- status_code
- service
# Drops: pod, container, instance, node — reduces series from 10k → 50
Workflow:
Drop or sample log lines before ingestion using Loki's pipeline stages in Alloy:
loki.process "filter_logs" {
forward_to = [loki.write.cloud.receiver]
// Drop health check logs (high volume, low value)
stage.drop {
expression = ".*GET /health.*"
}
// Drop debug logs in production
stage.drop {
source = "level"
expression = "debug"
}
// Sample verbose info logs (keep 10%)
stage.sampling {
rate = 0.1
source = "level"
value = "info"
}
}
Use Alloy tail-based sampling to keep only important traces:
otelcol.processor.tail_sampling "cost_control" {
decision_wait = "10s"
policy {
name = "keep-errors"
type = "status_code"
status_code { status_codes = ["ERROR"] }
}
policy {
name = "keep-slow"
type = "latency"
latency { threshold_ms = 1000 }
}
policy {
name = "sample-rest"
type = "probabilistic"
probabilistic { sampling_percentage = 5 }
}
output {
traces = [otelcol.exporter.otlp.cloud.input]
}
}
# Active metric series (billed unit for metrics)
grafana_cloud_metrics_active_series
# Series by label (find high-cardinality sources)
topk(20, count by (__name__) ({__name__=~".+"}))
# Log bytes ingested per stream
sum(increase(loki_ingester_chunk_size_bytes_sum[24h])) by (namespace, app)
# Trace spans ingested
rate(tempo_distributor_spans_received_total[5m])
topk(20, count by (__name__))team, project) to all Alloy configs| Signal | Billing Unit |
|---|---|
| Metrics | Active series (unique label combinations) |
| Logs | Bytes ingested |
| Traces | Spans ingested |
| Profiles | Bytes ingested |
| Synthetic Monitoring | Check executions |
| k6 | VUh (Virtual User hours) |