Skill

accessing-mlflow

Queries and browses MLflow evaluation results: find runs by invocation ID, compare metrics, fetch artifacts, and set up the MLflow MCP server.

Python

ai-ml

Popularity

Stars

287

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/nemo-evaluator-skills:accessing-mlflow

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

[mlflow-mcp](https://github.com/kkruglik/mlflow-mcp) gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

Supporting Files

BENCHMARK.mdskill-card.mdskill.oms.sig

SKILL.md

110 lines · ~1.1k tokens

Stats

LanguagePython

Stars287

Forks48

MaintenanceExcellent

Last CommitMay 28, 2026

Actions

View Source View Plugin View on GitHub View README

Accessing MLflow

MCP Server

mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

ID Convention

When the user provides a hex ID (e.g. 71f3f3199ea5e1f0) without specifying what it is, assume it is an invocation_id (not an MLflow run_id). An invocation_id identifies a launcher invocation and is stored as both a tag and a param on MLflow runs. One invocation can produce multiple MLflow runs (one per task). You may need to search across multiple experiments if you don't know which experiment the run belongs to.

Querying Runs

# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})

# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")

# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")

# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")

NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.

Workflow Tips

When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:

uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF

Artifacts Structure

<harness>.<task>/
├── artifacts/
│   ├── config.yml                # Fully resolved config used during the evaluation
│   ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│   ├── results.yml               # All results in YAML format
│   ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│   ├── report.html               # Request-Response Pairs samples in HTML format (if enabled)
│   └── report.json               # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
    ├── client-*.log              # Evaluation client
    ├── server-*-N.log            # Deployment per node
    ├── slurm-*.log               # Slurm job
    └── proxy-*.log               # Request proxy

Troubleshooting

If the MLflow MCP server fails to load or its tools are unavailable:

uvx not found — install uv using whichever option matches your environment:

# Recommended: install via an existing package manager (no remote script execution)
pipx install uv     # if you have pipx
pip install uv      # in a virtualenv
brew install uv     # macOS

If you prefer the official shell installer, download it, inspect it, and only then run it — do not pipe directly to sh:

curl -LsSf https://astral.sh/uv/install.sh -o /tmp/uv-install.sh
less /tmp/uv-install.sh   # review the contents before executing
sh /tmp/uv-install.sh

See the official uv installation docs for further options and checksum verification.

MCP server not configured — add the config and restart the agent:

For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":

"MLflow": {
  "command": "uvx",
  "args": ["mlflow-mcp"],
  "env": {
    "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
  }
}

For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):

{
  "mcpServers": {
    "MLflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
      }
    }
  }
}

accessing-mlflow

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

accessing-mlflow

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Accessing MLflow

MCP Server

ID Convention

Querying Runs

Workflow Tips

Artifacts Structure

Troubleshooting

Similar Skills

Accessing MLflow

MCP Server

ID Convention

Querying Runs

Workflow Tips

Artifacts Structure

Troubleshooting

Similar Skills