Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By whchoi98
Run 3-tier Claude Code project evaluations—quick checklist, standard static/dynamic analysis, or full multi-agent review—generating scored bilingual reports with history comparison and improvement roadmaps.
npx claudepluginhub whchoi98/harness-eval --plugin harness-evalUser-facing slash commands for evaluation. Each `.md` file in this directory is auto-discovered by Claude Code as a `/harness-eval:<name>` command.
Compare two harness evaluations side by side
Full harness evaluation — multi-agent comprehensive review (~5-10min)
Evaluate Claude Code harness engineering quality
Quick harness evaluation — checklist-based scoring (~30s)
Subagents for Full mode evaluation. Spawned in parallel by `skills/full.md` to perform qualitative analysis of target projects.
Scans project structure and collects harness artifacts for evaluation. Produces a structured project overview consumed by evaluator agents.
Evaluates harness actionability, testability, and contract-based testing. Assesses whether components are usable, tested, and have clear interfaces.
Evaluates harness architecture quality based on Anthropic's harness design patterns. Analyzes agent communication, context management, feedback loops, and evolvability.
Evaluates harness safety posture and cost efficiency. Deep analysis of tool permissions, deny lists, secret patterns, and model/tool cost optimization.
Compare harness evaluation history — shows score trends, per-tier deltas, diminishing returns detection, and next grade projection.
Full harness evaluation — multi-agent deep analysis across 12 dimensions (safety, completeness, design quality) with parallel evaluators and synthesized report. Takes 5-10 minutes. Produces a comprehensive scored report with executive summary and improvement roadmap.
Quick harness evaluation — checklist-based scoring in ~30 seconds. Runs deterministic checks against the target project and produces a score, grade, and improvement suggestions.
Standard harness evaluation — static analysis, dynamic testing, and checklist scoring in 2-3 minutes. Produces a detailed report with findings and improvement roadmap.
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Session harness plugin for Claude Code workflow automation
Skill evaluation and benchmarking - test skill effectiveness with behavioral eval cases, grade results, and track quality improvements
Universal quality control orchestrator and final authority for any software development project. Dynamically discovers and coordinates with available sub-agents, performs comprehensive multi-dimensional quality assessment, security validation, and deployment readiness verification. Adapts to any project type, programming language, or development framework while maintaining enterprise-grade quality standards. Examples: <example>Context: Code changes ready for review across any project. user: 'Please review this code before commit' assistant: 'I'll use the 1-ceo-quality-control-agent to orchestrate comprehensive quality validation, discover available specialists, and perform final security scanning before approval.' <commentary>Universal quality control requires comprehensive validation across all dimensions regardless of project type.</commentary></example> <example>Context: Multi-agent work completion needing validation. user: 'Several agents completed their tasks, need quality review' assistant: 'Let me engage the 1-ceo-quality-control-agent to coordinate comprehensive validation across all completed work and ensure quality standards.' <commentary>Multi-agent coordination and quality validation applies to any development project.</commentary></example>
Agents specialized in quality assurance, testing strategies, and test architecture. Focuses on ensuring code quality and reliability.
Harness Engineering framework - skills, agents, and commands for safe, reviewable, incremental agent-driven development. Includes RPEQ workflow (Research, Plan, Execute, QA), ast-grep setup, and codebase analysis tools.
Live codebase visualization and structural quality gate — 14 health dimensions graded A-F, dependency analysis, and architecture governance via MCP
36 on-demand AWS and cloud skills, slash commands, agents, and security hooks for Claude Code
A Claude Code plugin for systematic 3-tier evaluation of harness engineering quality Claude Code 하네스 엔지니어링 품질을 체계적으로 평가하는 플러그인
harness-eval is a Claude Code plugin that systematically evaluates the engineering quality of Claude Code harness configurations. It combines deterministic script-based quantitative checks with AI agent-powered qualitative reviews through a 3-tier evaluation system (Quick / Standard / Full).
The plugin scores projects across 6 dimensions — correctness, safety, completeness, actionability, consistency, and testability — producing structured reports with letter grades (A+ through F) and improvement roadmaps.
From the terminal (shell):
claude plugin marketplace add https://github.com/whchoi98/harness-eval
claude plugin install harness-eval@harness-eval
Or from inside a Claude Code session:
/plugin marketplace add https://github.com/whchoi98/harness-eval
/plugin install harness-eval@harness-eval
From the terminal:
claude plugin list
Or from inside a Claude Code session:
/plugin list
From the terminal:
claude plugin marketplace refresh
claude plugin install harness-eval@harness-eval
Or from inside a Claude Code session:
/plugin marketplace refresh
/plugin install harness-eval@harness-eval
From the terminal:
claude plugin remove harness-eval
claude plugin marketplace remove harness-eval
Or from inside a Claude Code session:
/plugin remove harness-eval
/plugin marketplace remove harness-eval
git clone https://github.com/whchoi98/harness-eval.git
cd harness-eval/plugins/harness-eval
bash scripts/setup.sh
Run evaluations from inside a Claude Code session:
/harness-eval:quick # Checklist-based scoring (~30s)
/harness-eval:standard # Static + dynamic analysis (~2-3min)
/harness-eval:full # Multi-agent comprehensive review (~5-10min)
/harness-eval:compare # Compare with previous evaluation
/harness-eval:harness-eval full # Argument style also works
Run evaluation scripts directly:
# Score a target project
HARNESS_EVAL_ROOT=$(pwd) bash scripts/scoring.sh /path/to/target-project
# Output: {"score": 7.2, "grade": "B", "checks": [...]}
# Run static analysis
HARNESS_EVAL_ROOT=$(pwd) bash scripts/static-analysis.sh /path/to/target-project
# Output: {"summary": {"pass": 12, "warn": 1, "fail": 0, "total": 13}, ...}
# View evaluation history
HARNESS_EVAL_ROOT=$(pwd) bash scripts/history.sh list /path/to/target-project
# Output: [{"id": "eval-2026-04-06-001", "score": 7.2, ...}]
# Generate badge
bash scripts/badge.sh /path/to/target-project
# Output: Badge SVG/Markdown
| Variable | Description | Default |
|---|---|---|
HARNESS_EVAL_ROOT | Path to the harness-eval plugin root directory | (required) |
CLAUDE_NOTIFY_WEBHOOK | Webhook URL for evaluation completion notifications | (empty, disabled) |