Plugin

agentv-dev

Name: agentv-dev
Author: entityprocess

Benchmark and optimize AI agents with AgentV evaluations: run benchmarks across providers like Anthropic and OpenAI, write and lint eval YAML files, analyze traces for regressions patterns costs and latency, bootstrap CLI setup in your workspace.

npx claudepluginhub entityprocess/agentv --plugin agentv-dev

Component Overview

Skills

Component Details

Skills (5)

agentv-bench

/agentv-bench

Run AgentV evaluations and optimize agents through eval-driven iteration. Triggers: run evals, benchmark agents, optimize prompts/skills against evals, compare agent outputs across providers, analyze eval results, offline evaluation of recorded sessions. Not for: writing/editing eval YAML without running (use agentv-eval-writer), analyzing existing traces/JSONL without re-running (use agentv-trace-analyst).

agentv-eval-review

/agentv-eval-review

Use when reviewing eval YAML files for quality issues, linting eval files before committing, checking eval schema compliance, or when asked to "review these evals", "check eval quality", "lint eval files", or "validate eval structure". Do NOT use for writing evals (use agentv-eval-writer) or running evals (use agentv-bench).

agentv-eval-writer

/agentv-eval-writer

Write, edit, review, and validate AgentV EVAL.yaml / .eval.yaml evaluation files. Use when asked to create new eval files, update or fix existing ones, add or remove test cases, configure graders (`llm-grader`, `code-grader`, `rubrics`), review whether an eval is correct or complete, convert between EVAL.yaml and evals.json using `agentv convert`, or generate eval test cases from chat transcripts (markdown conversation or JSON messages). Do NOT use for creating SKILL.md files, writing skill definitions, or running evals — running and benchmarking belongs to agentv-bench.

agentv-onboarding

/agentv-onboarding

Bootstrap AgentV in the current workspace after plugin-manager install. Ensures CLI availability, runs workspace init, and verifies setup artifacts.

agentv-trace-analyst

/agentv-trace-analyst

Analyze AgentV evaluation traces and result JSONL files using `agentv trace` and `agentv compare` CLI commands. Use when asked to inspect AgentV eval results, find regressions between AgentV evaluation runs, identify failure patterns in AgentV trace data, analyze tool trajectories, or compute cost/latency/score statistics from AgentV result files. Do NOT use for benchmarking skill trigger accuracy, analyzing skill-creator eval performance, or measuring skill description quality — those tasks belong to the skill-creator skill.

README

AgentV

Evaluate AI agents from the terminal. No server. No signup.

npm install -g agentv
agentv init
agentv eval evals/example.yaml

That's it. Results in seconds, not minutes.

What it does

AgentV runs evaluation cases against your AI agents and scores them with deterministic code graders + customizable LLM graders. Everything lives in Git — YAML eval files, markdown judge prompts, JSONL results.

# evals/math.yaml
description: Math problem solving
tests:
  - id: addition
    input: What is 15 + 27?
    expected_output: "42"
    assertions:
      - type: contains
        value: "42"

agentv eval evals/math.yaml

Why AgentV?

Local-first — runs on your machine, no cloud accounts or API keys for eval infrastructure
Version-controlled — evals, judges, and results all live in Git
Hybrid graders — deterministic code checks + LLM-based subjective scoring
CI/CD native — exit codes, JSONL output, threshold flags for pipeline gating
Any agent — supports Claude, Codex, Copilot, VS Code, Pi, Azure OpenAI, or any CLI agent

Quick start

1. Install and initialize:

npm install -g agentv
agentv init

2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.

3. Create an eval in evals/:

description: Code generation quality
tests:
  - id: fizzbuzz
    criteria: Write a correct FizzBuzz implementation
    input: Write FizzBuzz in Python
    assertions:
      - type: contains
        value: "fizz"
      - type: code-grader
        command: ./validators/check_syntax.py
      - type: llm-grader
        prompt: ./graders/correctness.md

4. Run it:

agentv eval evals/my-eval.yaml

5. Compare results across targets:

agentv compare .agentv/results/runs/<timestamp>/index.jsonl

Output formats

agentv eval evals/my-eval.yaml                  # JSONL (default)
agentv eval evals/my-eval.yaml -o report.html   # HTML dashboard
agentv eval evals/my-eval.yaml -o results.xml   # JUnit XML for CI

TypeScript SDK

Use AgentV programmatically:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Documentation

Full docs at agentv.dev/docs.

Eval files — format and structure
Custom evaluators — code graders in any language
Rubrics — structured criteria scoring
Targets — configure agents and providers
Compare results — A/B testing and regression detection
Ecosystem — how AgentV fits with Agent Control and Langfuse

Development

git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test

See AGENTS.md for development guidelines.

License

MIT

Similar Plugins

agentic-engineering

Design and review AI agent systems — architecture patterns, workflow design, and plugin quality review

1mo

v1.0.0

Stats

Version1.0.0

Parent Repo Stars11

MaintenanceExcellent

AddedMar 19, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Available In

agentv12

Help us improve

Share bugs, ideas, or general feedback.

Back to Plugins

AgentV

Evaluate AI agents from the terminal. No server. No signup.

npm install -g agentv
agentv init
agentv eval evals/example.yaml

That's it. Results in seconds, not minutes.

What it does

# evals/math.yaml
description: Math problem solving
tests:
  - id: addition
    input: What is 15 + 27?
    expected_output: "42"
    assertions:
      - type: contains
        value: "42"

agentv eval evals/math.yaml

Why AgentV?

Local-first — runs on your machine, no cloud accounts or API keys for eval infrastructure
Version-controlled — evals, judges, and results all live in Git
Hybrid graders — deterministic code checks + LLM-based subjective scoring
CI/CD native — exit codes, JSONL output, threshold flags for pipeline gating
Any agent — supports Claude, Codex, Copilot, VS Code, Pi, Azure OpenAI, or any CLI agent

Quick start

1. Install and initialize:

npm install -g agentv
agentv init

2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.

3. Create an eval in evals/:

description: Code generation quality
tests:
  - id: fizzbuzz
    criteria: Write a correct FizzBuzz implementation
    input: Write FizzBuzz in Python
    assertions:
      - type: contains
        value: "fizz"
      - type: code-grader
        command: ./validators/check_syntax.py
      - type: llm-grader
        prompt: ./graders/correctness.md

4. Run it:

agentv eval evals/my-eval.yaml

5. Compare results across targets:

agentv compare .agentv/results/runs/<timestamp>/index.jsonl

Output formats

agentv eval evals/my-eval.yaml                  # JSONL (default)
agentv eval evals/my-eval.yaml -o report.html   # HTML dashboard
agentv eval evals/my-eval.yaml -o results.xml   # JUnit XML for CI

TypeScript SDK

Use AgentV programmatically:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Documentation

Full docs at agentv.dev/docs.

Eval files — format and structure
Custom evaluators — code graders in any language
Rubrics — structured criteria scoring
Targets — configure agents and providers
Compare results — A/B testing and regression detection
Ecosystem — how AgentV fits with Agent Control and Langfuse

Development

git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test

See AGENTS.md for development guidelines.

License

MIT

agentv-dev

Component Overview

Component Details

Skills (5)

README

AgentV

What it does

Why AgentV?

Quick start

Output formats

TypeScript SDK

Documentation

Development

License

Similar Plugins

agentic-engineering

Help us improve

Help us improve

agentv-dev

Component Overview

Component Details

Skills (5)

README

AgentV

What it does

Why AgentV?

Quick start

Output formats

TypeScript SDK

Documentation

Development

License

Similar Plugins

agentic-engineering

Help us improve

evalview

evaluate-agent

agent-validator

skill-optimizer

prompts.chat