From grafana-app-sdk
Identifies Prometheus metrics driving high DPM in Grafana Cloud with per-label breakdowns to optimize costs. Use for DPM analysis, high-cardinality metrics, and noisy metric detection.
npx claudepluginhub grafana/skills --plugin grafana-app-sdkThis skill uses the workspace's default tool permissions.
A Grafana Professional Services tool for identifying which Prometheus metrics
Analyzes Prometheus metric DPM rates for Grafana Cloud stacks to identify high data points per minute drivers with per-label breakdowns. Uses gcx for stack discovery and presents sorted tables.
Reduces Grafana Cloud Metrics costs by analyzing Prometheus usage and generating Adaptive Metrics aggregation rules to manage cardinality and drop unused labels.
Generates PromQL queries, alerting/recording rules, and Prometheus dashboards via interactive workflow clarifying goals, metrics, and use cases like Grafana viz or troubleshooting.
Share bugs, ideas, or general feedback.
A Grafana Professional Services tool for identifying which Prometheus metrics drive high Data Points per Minute (DPM). Analyzes metric-level DPM with per-label breakdown to help optimize Grafana Cloud costs.
Source: https://github.com/grafana-ps/dpm-finder
git clone https://github.com/grafana-ps/dpm-finder.git
cd dpm-finder
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
.env_example to .env and filling in values:
PROMETHEUS_ENDPOINT -- The Prometheus endpoint URL (must end in .net, nothing after)PROMETHEUS_USERNAME -- Tenant ID / stack ID (numeric)PROMETHEUS_API_KEY -- Grafana Cloud API key (glc_... format)If gcx is available, use it to find stack details:
gcx config check # Show active stack context
gcx config list-contexts # List all configured stacks
gcx config view # Full config with endpoints
The Prometheus endpoint follows the pattern:
https://prometheus-{cluster_slug}.grafana.net
The username is the numeric stack ID. gcx auto-discovers service URLs from the stack slug via GCOM.
Look up the stack in the Grafana Cloud portal, or query the usage datasource:
grafanacloud_instance_info{name=~"STACK_NAME.*"}
Extract cluster_slug for the endpoint URL and id for the username.
./dpm-finder.py -f json -m 2.0 -t 8 --timeout 120 -l 10
| Flag | Default | Description |
|---|---|---|
-f, --format | csv | Output format: csv, text, txt, json, prom |
-m, --min-dpm | 1.0 | Minimum DPM threshold to include a metric |
-t, --threads | 10 | Concurrent processing threads |
-l, --lookback | 10 | Lookback window in minutes for DPM calculation |
--timeout | 60 | API request timeout in seconds |
--cost-per-1000-series | (none) | Dollar cost per 1000 series; adds estimated_cost column |
-q, --quiet | false | Suppress progress output |
-v, --verbose | false | Enable debug logging |
-e, --exporter | false | Run as Prometheus exporter instead of one-shot |
-p, --port | 9966 | Exporter server port |
-u, --update-interval | 86400 | Exporter metric refresh interval in seconds |
Output files are written to the current working directory.
-f json) -> metric_rates.jsonBest for programmatic analysis. Includes per-series DPM breakdown:
metrics[].metric_name -- the metric namemetrics[].dpm -- data points per minute (maximum across this metric's individual series)metrics[].series_count -- number of active time seriesmetrics[].series_detail[] -- per-label-set DPM breakdown (sorted by DPM descending)total_metrics_above_threshold -- count of metrics above thresholdperformance_metrics.total_runtime_seconds -- total processing timeperformance_metrics.average_metric_processing_seconds -- avg time per metricperformance_metrics.total_metrics_processed -- total metrics analyzedperformance_metrics.metrics_per_second -- processing throughput-f csv) -> metric_rates.csvColumns: metric_name, dpm, series_count (plus estimated_cost if --cost-per-1000-series is set).
-f text) -> metric_rates.txtHuman-readable format with per-series breakdown and performance statistics.
-f prom) -> metric_rates.promPrometheus exposition format suitable for Alloy's prometheus.exporter.unix textfile collector.
series_detail to identify which label combinations drive the highest DPM--cost-per-1000-series is set, use estimated_cost to prioritize by spendWhen running dpm-finder against multiple stacks, limit to max 3 concurrent runs. Batch the stacks and wait for each batch to complete before starting the next.
The tool automatically excludes:
*_count, *_bucket, *_sum suffixesgrafana_* prefix/aggregations/rules)Run as a long-lived Prometheus exporter instead of one-shot analysis:
./dpm-finder.py -e -p 9966 -u 86400
Serves metrics at http://localhost:PORT/metrics. Recalculates at the configured interval (default: daily). See README.md for full exporter and Docker documentation.
Alternative to local Python setup:
docker build -t dpm-finder:latest .
docker run --rm --env-file .env -v $(pwd)/output:/app/output \
dpm-finder:latest --format json --min-dpm 2.0
See README.md for full Docker Compose, production deployment, and monitoring integration docs.
metrics:read scope. Confirm PROMETHEUS_USERNAME matches the numeric stack ID.--timeout for large metric sets. The default is 60s; use 120s or higher for stacks with thousands of metrics.--min-dpm threshold. Check that PROMETHEUS_ENDPOINT does not have a trailing path after .net.The tool retries failed API requests with exponential backoff (up to 10 retries). Rate-limited responses (HTTP 429) are backed off automatically. HTTP 4xx errors other than 429 are not retried.
dpm-finder.py # Main CLI tool (one-shot + exporter modes)
requirements.txt # Python dependencies
.env_example # Template for credential configuration
Dockerfile # Multi-stage Docker build
docker-compose.yml # Docker Compose orchestration
README.md # Full project documentation