Help us improve
Share bugs, ideas, or general feedback.
Queries and browses MLflow evaluation results: find runs by invocation ID, compare metrics, fetch artifacts, and set up the MLflow MCP server.
npx claudepluginhub nvidia-nemo/evaluatorHow this skill is triggered — by the user, by Claude, or both
Slash command
/nemo-evaluator-skills:accessing-mlflowThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
[mlflow-mcp](https://github.com/kkruglik/mlflow-mcp) gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.
Dispatches MLflow tasks to the appropriate sub-skill for tracing, evaluation, debugging, and onboarding. Use when the user needs MLflow help but hasn't specified a sub-skill.
Sets up MLflow tracking server, autologging for scikit-learn/PyTorch/TensorFlow/XGBoost, run comparisons with metrics/visuals, artifact management for reproducible ML workflows. For new projects, log migration, or CI/CD integration.
Queries, tags, evaluates, and manages MLflow traces via MCP tools. Used for debugging, performance analysis, feedback logging, custom scoring, and trace cleanup.
Share bugs, ideas, or general feedback.
mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.
When the user provides a hex ID (e.g. 71f3f3199ea5e1f0) without specifying what it is, assume it is an invocation_id (not an MLflow run_id). An invocation_id identifies a launcher invocation and is stored as both a tag and a param on MLflow runs. One invocation can produce multiple MLflow runs (one per task). You may need to search across multiple experiments if you don't know which experiment the run belongs to.
# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})
# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")
# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")
# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")
NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.
When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:
uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF
<harness>.<task>/
├── artifacts/
│ ├── config.yml # Fully resolved config used during the evaluation
│ ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│ ├── results.yml # All results in YAML format
│ ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│ ├── report.html # Request-Response Pairs samples in HTML format (if enabled)
│ └── report.json # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
├── client-*.log # Evaluation client
├── server-*-N.log # Deployment per node
├── slurm-*.log # Slurm job
└── proxy-*.log # Request proxy
If the MLflow MCP server fails to load or its tools are unavailable:
uvx not found — install uv using whichever option matches your environment:
# Recommended: install via an existing package manager (no remote script execution)
pipx install uv # if you have pipx
pip install uv # in a virtualenv
brew install uv # macOS
If you prefer the official shell installer, download it, inspect it, and only then run it — do not pipe directly to sh:
curl -LsSf https://astral.sh/uv/install.sh -o /tmp/uv-install.sh
less /tmp/uv-install.sh # review the contents before executing
sh /tmp/uv-install.sh
See the official uv installation docs for further options and checksum verification.
MCP server not configured — add the config and restart the agent:
For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":
"MLflow": {
"command": "uvx",
"args": ["mlflow-mcp"],
"env": {
"MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
}
}
For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):
{
"mcpServers": {
"MLflow": {
"command": "uvx",
"args": ["mlflow-mcp"],
"env": {
"MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
}
}
}
}