From agent-knowledge
Evaluates Claude agent quality on library questions across three modes: without BK, BK grep-only, and BK full. Scores accuracy, specificity, completeness, source grounding using predefined or custom queries.
npx claudepluginhub chris-xperimntl/agent-knowledgeThis skill is limited to using the following tools:
Compare how well Claude answers library questions across three access levels:
Runs AgentV evaluations to benchmark AI agents, optimize prompts/skills via eval-driven iteration, compare outputs across providers, and analyze results.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Compares coding agents like Claude Code, Aider, and Codex head-to-head on custom repo tasks using YAML definitions, git worktrees, and judges for pass rate, cost, time, consistency metrics.
Share bugs, ideas, or general feedback.
Compare how well Claude answers library questions across three access levels:
Parse $ARGUMENTS:
--predefined: Run all predefined queries--predefined N: Run predefined query #N onlyexecute with { command: "stores" } to list stores. Abort if none.$CLAUDE_PLUGIN_ROOT/evals/agent-quality/queries/predefined.yaml or use arbitrary query.$CLAUDE_PLUGIN_ROOT/evals/agent-quality/templates/{{QUESTION}}, {{STORES}}, {{STORE_PATHS}})Detailed procedures: references/procedures.md
Output format: references/output-format.md