Skill

eval-mlflow

Integrates AI evaluation harness with MLflow: syncs datasets to MLflow, logs run results and judge scores to traces, pushes/pulls feedback, views results in UI.

Python

ai-ml

testing

npx claudepluginhub opendatahub-io/agent-eval-harness --plugin agent-eval-harness

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBashGlobGrepAskUserQuestion

Preview

You are an MLflow integration agent. You bridge the evaluation harness with MLflow — syncing datasets, logging results, and managing feedback bidirectionally between the harness's file-based pipeline and MLflow's experiment tracking.

Supporting Assets

scripts/agent_evalscripts/attach_feedback.pyscripts/from_traces.pyscripts/log_results.pyscripts/sync_dataset.pyscripts/trace_from_stdout.py

SKILL.md

Similar Skills

strategic-compact

179.0k

Suggests manual /compact at logical task boundaries in long Claude Code sessions and multi-phase tasks to avoid arbitrary auto-compaction losses.

1 file

ecc

Stats

Stars7

Forks8

Last CommitApr 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Argument	Required	Default	Description
`--action <action>`	no	`all`	One of: `sync-dataset`, `log-results`, `push-feedback`, `pull-feedback`, `all`
`--config <path>`	no	`eval.yaml`	Path to eval config
`--run-id <id>`	for log/push/pull	—	Which eval run to log or attach feedback to

Argument

Required

Default

Description

--action <action>

all

One of: sync-dataset, log-results, push-feedback, pull-feedback, all

--config <path>

eval.yaml

Path to eval config

--run-id <id>

for log/push/pull

—

Which eval run to log or attach feedback to

PYTHONPATH=${CLAUDE_SKILL_DIR}/scripts python3 -c " from agent_eval.mlflow.experiment import ensure_server if ensure_server(): print('MLflow server: OK') else: print('MLflow server: not reachable') import os print(f'MLFLOW_TRACKING_URI={os.environ.get(\"MLFLOW_TRACKING_URI\", \"not set\")}') "

Argument	Required	Default	Description
`--action <action>`	no	`all`	One of: `sync-dataset`, `log-results`, `push-feedback`, `pull-feedback`, `all`
`--config <path>`	no	`eval.yaml`	Path to eval config
`--run-id <id>`	for log/push/pull	—	Which eval run to log or attach feedback to

Argument

Required

Default

Description

--action <action>

all

One of: sync-dataset, log-results, push-feedback, pull-feedback, all

--config <path>

eval.yaml

Path to eval config

--run-id <id>

for log/push/pull

—

Which eval run to log or attach feedback to

eval-mlflow

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

eval-mlflow

Tool Access

Preview

Supporting Assets

SKILL.md

Step 0: Parse Arguments

Step 1: Verify MLflow

Step 2: Read Configuration

Step 3: Sync Dataset (if `--action sync-dataset` or `all`)

Step 3a: Read schema and sample case

Step 3b: Produce schema mapping

Step 3c: Run sync

Step 4: Log Run Results (if `--action log-results` or `all`)

Step 5: Push Feedback (if `--action push-feedback` or `all`)

Step 6: Pull Feedback (if `--action pull-feedback`)

Step 7: Report

Rules

Similar Skills

Help us improve

Step 0: Parse Arguments

Step 1: Verify MLflow

Step 2: Read Configuration

Step 3: Sync Dataset (if `--action sync-dataset` or `all`)

Step 3a: Read schema and sample case

Step 3b: Produce schema mapping

Step 3c: Run sync

Step 4: Log Run Results (if `--action log-results` or `all`)

Step 5: Push Feedback (if `--action push-feedback` or `all`)

Step 6: Pull Feedback (if `--action pull-feedback`)

Step 7: Report

Rules

eval-mlflow

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

eval-mlflow

Tool Access

Preview

Supporting Assets

SKILL.md

Step 0: Parse Arguments

Step 1: Verify MLflow

Step 2: Read Configuration

Step 3: Sync Dataset (if --action sync-dataset or all)

Step 3a: Read schema and sample case

Step 3b: Produce schema mapping

Step 3c: Run sync

Step 4: Log Run Results (if --action log-results or all)

Step 5: Push Feedback (if --action push-feedback or all)

Step 6: Pull Feedback (if --action pull-feedback)

Step 7: Report

Rules

Similar Skills

Help us improve

Step 0: Parse Arguments

Step 1: Verify MLflow

Step 2: Read Configuration

Step 3: Sync Dataset (if --action sync-dataset or all)

Step 3a: Read schema and sample case

Step 3b: Produce schema mapping

Step 3c: Run sync

Step 4: Log Run Results (if --action log-results or all)

Step 5: Push Feedback (if --action push-feedback or all)

Step 6: Pull Feedback (if --action pull-feedback)

Step 7: Report

Rules

Step 3: Sync Dataset (if `--action sync-dataset` or `all`)

Step 4: Log Run Results (if `--action log-results` or `all`)

Step 5: Push Feedback (if `--action push-feedback` or `all`)

Step 6: Pull Feedback (if `--action pull-feedback`)

Step 3: Sync Dataset (if `--action sync-dataset` or `all`)

Step 4: Log Run Results (if `--action log-results` or `all`)

Step 5: Push Feedback (if `--action push-feedback` or `all`)

Step 6: Pull Feedback (if `--action pull-feedback`)