From mlflow
Queries MLflow tracking servers for aggregated trace metrics including token usage, latency, trace counts, and quality evaluations via Python CLI script.
npx claudepluginhub mlflow/skillsThis skill uses the workspace's default tool permissions.
Run `scripts/fetch_metrics.py` to query metrics from an MLflow tracking server.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Run scripts/fetch_metrics.py to query metrics from an MLflow tracking server.
Token usage summary:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG
Output: AVG: 223.91 SUM: 7613
Hourly token trend (last 24h):
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
-t 3600 --start-time="-24h" --end-time=now
Output: Time-bucketed token sums per hour
Latency percentiles by trace:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name
Error rate by status:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status
Quality scores by evaluator (assessments):
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
-m assessment_value -a AVG,P50 -d assessment_name
Output: Average and median scores for each evaluator (e.g., correctness, relevance)
Assessment count by name:
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
-m assessment_count -a COUNT -d assessment_name
JSON output: Add -o json to any command.
| Arg | Required | Description |
|---|---|---|
-s, --server | Yes | MLflow server URL |
-x, --experiment-ids | Yes | Experiment IDs (comma-separated) |
-m, --metric | Yes | trace_count, latency, input_tokens, output_tokens, total_tokens |
-a, --aggregations | Yes | COUNT, SUM, AVG, MIN, MAX, P50, P95, P99 |
-d, --dimensions | No | Group by: trace_name, trace_status |
-t, --time-interval | No | Bucket size in seconds (3600=hourly, 86400=daily) |
--start-time | No | -24h, -7d, now, ISO 8601, or epoch ms |
--end-time | No | Same formats as start-time |
-o, --output | No | table (default) or json |
For SPANS metrics (span_count, latency), add -v SPANS.
For ASSESSMENTS metrics, add -v ASSESSMENTS.
See references/api_reference.md for filter syntax and full API details.