From grafana-app-sdk
Reduces Grafana Cloud Metrics costs by analyzing Prometheus usage and generating Adaptive Metrics aggregation rules to manage cardinality and drop unused labels.
npx claudepluginhub grafana/skills --plugin grafana-app-sdkThis skill uses the workspace's default tool permissions.
Adaptive Metrics analyses your Prometheus metrics usage and suggests aggregation rules that
Monitors Grafana Cloud usage and costs, attributes spending by labels/teams, sets quota alerts, manages invoices, optimizes with Adaptive Metrics/Logs for cardinality reduction. Use for budgeting and FinOps.
Analyzes VictoriaMetrics time series cardinality to identify unused metrics, high-cardinality labels, problematic values, and histogram bloat. Recommends relabeling configs and stream aggregations for optimization.
Generates PromQL queries, alerting/recording rules, and Prometheus dashboards via interactive workflow clarifying goals, metrics, and use cases like Grafana viz or troubleshooting.
Share bugs, ideas, or general feedback.
Adaptive Metrics analyses your Prometheus metrics usage and suggests aggregation rules that reduce series count without breaking any queries. Rules pre-aggregate high-cardinality metrics into lower-cardinality forms before storage.
How it works:
Billing: Grafana Cloud charges per Active Series (series that received a sample in the last hour). Adaptive Metrics reduces your Active Series count, directly reducing your bill.
In Grafana Cloud: Home > Adaptive Metrics (or via the app menu).
You need the Grafana Cloud Metrics plan. Adaptive Metrics is available on all paid plans.
Key views:
Recommendations are sorted by estimated series reduction (highest savings first).
Each recommendation shows:
Review before applying:
# Check if any dashboards or alerts use the label being dropped
# Replace METRIC_NAME and LABEL_NAME with actual values
grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"
Or in Grafana: use Explore > Metrics to query the metric and check which labels are present and used.
Via the UI:
Via the API:
# List recommendations
curl -s -H "Authorization: Bearer <API_KEY>" \
"https://adaptive-metrics.grafana.net/api/v1/recommendations" | \
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
# Apply a recommendation by ID
curl -s -X POST \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"
If you know which labels to drop without waiting for recommendations, create rules directly.
Rule format:
# Aggregation rule: keep only job and instance labels for process_cpu_seconds_total
rules:
- match_metric: process_cpu_seconds_total
drop_labels:
- version
- go_version
- service_name
aggregations:
- type: sum
without: [] # empty = keep only the labels not in drop_labels
Via the API:
curl -s -X POST \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
"https://adaptive-metrics.grafana.net/api/v1/rules" \
-d '{
"rules": [
{
"metric_name": "process_cpu_seconds_total",
"match_type": "MATCH_TYPE_EXACT",
"drop_labels": ["version", "go_version"],
"aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
}
]
}'
Aggregation types:
| Type | Use case |
|---|---|
sum | Counters, request counts, byte totals |
max | Gauges where you want the worst-case (e.g. CPU max across pods) |
min | Gauges where you want the best-case |
avg | Rate metrics, averages |
For counters, always use sum. Averaging counters produces incorrect rates.
Use regex rules to cover families of metrics with similar label patterns:
# Apply a rule to all metrics matching a pattern
curl -s -X POST \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
"https://adaptive-metrics.grafana.net/api/v1/rules" \
-d '{
"rules": [
{
"metric_name": "go_.*",
"match_type": "MATCH_TYPE_REGEX",
"drop_labels": ["go_version", "version", "service_instance_id"],
"aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
}
]
}'
Common label families safe to drop globally:
version, app_version, go_version - rarely queried in PromQLservice_instance_id, pod_uid, container_id - ultra-high cardinalitygit_commit, build_date - static labels that inflate series for no query valueUnused metrics (never queried in any dashboard, alert, or recording rule) can be dropped entirely.
In the UI: Adaptive Metrics > Usage analysis > "Unused metrics" tab
Via the API:
curl -s -H "Authorization: Bearer <API_KEY>" \
"https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
jq '.metrics[] | {metric_name, series_count, last_queried}'
Before dropping a metric entirely:
Drop unused metrics via remote_write filtering in Alloy:
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
write_relabel_config {
source_labels = ["__name__"]
regex = "unused_metric_name|another_unused_metric"
action = "drop"
}
}
}
For log volume reduction, Adaptive Logs works the same way for Loki:
# Check log volume recommendations
curl -s -H "Authorization: Bearer <API_KEY>" \
"https://adaptive-logs.grafana.net/api/v1/recommendations" | \
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'
Log pattern: drops low-value log streams (e.g. debug logs from non-critical services) during high-volume periods or permanently.
After applying rules, monitor the effect over 24-48 hours:
# Active Series count over time (visible in Grafana Cloud Metrics Usage dashboard)
grafanacloud_instance_active_series
# Series reduction from adaptive metrics
grafanacloud_instance_active_series_dropped_by_aggregation_rules
In Grafana Cloud: Home > Usage > Metrics shows before/after series counts and the billing impact of active rules.
Expected timeline: