Design AI agent architectures by diagnosing task and project needs, selecting patterns like single-agent loops, multi-agent systems, or optimization loops, and defining structured workflows with phases and tools; review agent plugins in PRs via Python linting, YAML checks, skill audits, and inline comment generation.
npx claudepluginhub entityprocess/agentv --plugin agentic-engineeringUse when designing an AI agent system, selecting agentic design patterns, planning multi-phase workflows, choosing between single-agent and multi-agent architectures, or when asked "what kind of agent should I build", "how should I structure this automation", "design an agent for X", or "which agentic pattern fits this problem".
Use when reviewing an AI plugin pull request, auditing plugin quality before release, or when asked to "review a plugin PR", "review skills in this PR", "check plugin quality", or "review workflow architecture". Covers skill quality, structural linting, and workflow architecture review.
Evaluate AI agents from the terminal. No server. No signup.
npm install -g agentv
agentv init
agentv eval evals/example.yaml
That's it. Results in seconds, not minutes.
AgentV runs evaluation cases against your AI agents and scores them with deterministic code graders + customizable LLM graders. Everything lives in Git — YAML eval files, markdown judge prompts, JSONL results.
# evals/math.yaml
description: Math problem solving
tests:
- id: addition
input: What is 15 + 27?
expected_output: "42"
assertions:
- type: contains
value: "42"
agentv eval evals/math.yaml
1. Install and initialize:
npm install -g agentv
agentv init
2. Configure targets in .agentv/targets.yaml — point to your agent or LLM provider.
3. Create an eval in evals/:
description: Code generation quality
tests:
- id: fizzbuzz
criteria: Write a correct FizzBuzz implementation
input: Write FizzBuzz in Python
assertions:
- type: contains
value: "fizz"
- type: code-grader
command: ./validators/check_syntax.py
- type: llm-grader
prompt: ./graders/correctness.md
4. Run it:
agentv eval evals/my-eval.yaml
5. Compare results across targets:
agentv compare .agentv/results/runs/<timestamp>/index.jsonl
agentv eval evals/my-eval.yaml # JSONL (default)
agentv eval evals/my-eval.yaml -o report.html # HTML dashboard
agentv eval evals/my-eval.yaml -o results.xml # JUnit XML for CI
Use AgentV programmatically:
import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({
tests: [
{
id: 'greeting',
input: 'Say hello',
assertions: [{ type: 'contains', value: 'Hello' }],
},
],
});
console.log(`${summary.passed}/${summary.total} passed`);
Full docs at agentv.dev/docs.
git clone https://github.com/EntityProcess/agentv.git
cd agentv
bun install && bun run build
bun test
See AGENTS.md for development guidelines.
MIT
Development skills for building and optimizing AgentV evaluations
Share bugs, ideas, or general feedback.
A CLI tool for validating AI coding agents
Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Anthropic Claude, HuggingFace, Ollama, and MCP.
Set up evaluation of AI agents with tool call validation, correctness checks, task completion, and tool reliability using Dokimos. Framework-agnostic — works with any agent framework.
Editorial "Agent Architect" bundle for Claude Code from Antigravity Awesome Skills.
Comprehensive context engineering skills for building production-grade AI agent systems — covering fundamentals, degradation patterns, compression, optimization, multi-agent coordination, memory systems, tool design, filesystem context, hosted agents, evaluation, project development, and cognitive architecture