Marketplace

agentv

Evaluate and optimize AI agents

npx claudepluginhub entityprocess/agentv

README

2 Plugins

agentic-engineering

Design and review AI agent systems — architecture patterns, workflow design, and plugin quality review

1mo

v1.0.0

agentv-dev

Development skills for building and optimizing AgentV evaluations

v1.0.0

Related Marketplaces

antigravity-awesome-skills

36.4K

0plugins

Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.

claude-code-workflows

34.7K

0plugins

Production-ready workflow orchestration with 79 focused plugins, 184 specialized agents, and 150 skills - optimized for granular installation and minimal token usage

claude-plugins-official

18.5K

0plugins

Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations

Stats

Plugins2

Stars11

UpdatedMay 5, 2026

Links

View on GitHub View Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

AgentV

Evaluate AI agents from the terminal. No server. No signup.

npm install -g agentv
agentv init
agentv eval evals/example.yaml

That's it. Results in seconds, not minutes.

What it does

AgentV runs evaluation cases against your AI agents and scores them with deterministic code graders + customizable LLM graders. Everything lives in Git — YAML eval files, markdown judge prompts, JSONL results.

# evals/math.yaml
description: Math problem solving
tests:
  - id: addition
    input: What is 15 + 27?
    expected_output: "42"
    assertions:
      - type: contains
        value: "42"

agentv eval evals/math.yaml

Why AgentV?

Local-first — runs on your machine, no cloud accounts or API keys for eval infrastructure
Version-controlled — evals, judges, and results all live in Git
Hybrid graders — deterministic code checks + LLM-based subjective scoring
CI/CD native — exit codes, JSONL output, threshold flags for pipeline gating
Any agent — supports Claude, Codex, Copilot, VS Code, Pi, Azure OpenAI, or any CLI agent

Quick start

1. Install and initialize:

npm install -g agentv
agentv init

2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.

3. Create an eval in evals/:

description: Code generation quality
tests:
  - id: fizzbuzz
    criteria: Write a correct FizzBuzz implementation
    input: Write FizzBuzz in Python
    assertions:
      - type: contains
        value: "fizz"
      - type: code-grader
        command: ./validators/check_syntax.py
      - type: llm-grader
        prompt: ./graders/correctness.md

4. Run it:

agentv eval evals/my-eval.yaml

5. Compare results across targets:

agentv compare .agentv/results/runs/<timestamp>/index.jsonl

Output formats

agentv eval evals/my-eval.yaml                  # JSONL (default)
agentv eval evals/my-eval.yaml -o report.html   # HTML dashboard
agentv eval evals/my-eval.yaml -o results.xml   # JUnit XML for CI

TypeScript SDK

Use AgentV programmatically:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Documentation

Full docs at agentv.dev/docs.

Eval files — format and structure
Custom evaluators — code graders in any language
Rubrics — structured criteria scoring
Targets — configure agents and providers
Compare results — A/B testing and regression detection
Ecosystem — how AgentV fits with Agent Control and Langfuse

Development

git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test

See AGENTS.md for development guidelines.

License

MIT