Help us improve
Share bugs, ideas, or general feedback.
From lightdash-agentops
Orchestrate evaluation runs and test case management for Lightdash agents.
npx claudepluginhub yu-iskw/dbt-heroes --plugin lightdash-agentopsHow this skill is triggered — by the user, by Claude, or both
Slash command
/lightdash-agentops:run-lightdash-evalsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Skill for managing and executing evaluations for Lightdash AI agents.
Manages AI observability evaluations — inspect, run, debug, and summarize Hog (deterministic) and LLM-judge (prompt-based) evaluators against generations.
Evaluates and improves GenAI agent output quality using MLflow's native APIs for datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components.
Guides evaluating ADK agents using the Quality Flywheel: create datasets, run inference, LLM-as-judge grading, and failure analysis.
Share bugs, ideas, or general feedback.
Skill for managing and executing evaluations for Lightdash AI agents.
Enables the "Eval-Driven Development" workflow by providing tools to create evaluation suites, append test cases (prompts), execute evaluation runs, and analyze the results.
Wraps the following MCP tools from the lightdash-tools server:
ldt__list_agent_evaluationsldt__get_agent_evaluationldt__create_agent_evaluationldt__update_agent_evaluationldt__append_agent_evaluation_promptsldt__run_agent_evaluationldt__list_agent_evaluation_runsldt__get_agent_evaluation_run_resultsldt__delete_agent_evaluationlist_agent_evaluations, get_agent_evaluation, list_agent_evaluation_runs, get_agent_evaluation_run_results.create_agent_evaluation, update_agent_evaluation, append_agent_evaluation_prompts, run_agent_evaluation.delete_agent_evaluation.ldt__append_agent_evaluation_prompts to add 20-50 diverse test cases representing real-world user queries.ldt__run_agent_evaluation.ldt__list_agent_evaluation_runs.ldt__get_agent_evaluation_run_results.agent-tuner sub-agent to automatically process evaluation results for improvement.