Skill

full

Performs comprehensive multi-agent evaluation of code projects across 12 dimensions like safety, completeness, and design quality. Outputs scored reports with executive summaries and improvement roadmaps in 5-10 minutes.

code-quality

security

npx claudepluginhub whchoi98/harness-eval --plugin harness-eval

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness-eval:full

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are performing a Full harness evaluation. This is the most comprehensive evaluation mode. It uses a multi-agent architecture with 3 phases: Collection, Parallel Evaluation, and Synthesis. The result is a 12-dimension scored report.

SKILL.md

333 lines · ~3.1k tokens

Similar Skills

codebase-readiness

Runs Agent-Ready Codebase Assessment scoring codebase across 8 dimensions with parallel agents, producing weighted 0-100 score, band rating, and improvement roadmap. Supports Ruby, Python, PHP, TypeScript, JavaScript, Go, Java, Scala, Rust.

19 files

codebase-readiness

harness-audit

125

Scores a project's agent harness across 5 subsystems, identifies the bottleneck, and outputs a prioritized improvement plan. Use to assess readiness for long-run agent sessions or when adopting an agent stack on a new codebase.

2 files

claude-code-config

repo-eval

Evaluates a codebase across 12 pillars using 3 parallel evaluator agents, producing a scored assessment for targeted remediation.

6 tools

forge

Stats

LanguageShell

Parent stars16

Parent forks1

MaintenanceGood

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

You are the collector agent for harness-eval. Scan the project at the following path and produce a structured project artifact. Project path: <INSERT_CURRENT_WORKING_DIRECTORY> For reference, here are the Standard evaluation results already collected: ## Static Analysis Results <INSERT static_results HERE> ## Scoring Results <INSERT score_results HERE> Follow all instructions in your agent definition. Produce the full structured artifact in the Agent Communication Protocol format.

You are the safety-evaluator agent for harness-eval. Evaluate the Safety and Cost Efficiency dimensions for the project described below. ## Project Artifact (from collector) <INSERT project_artifact HERE> ## Static Analysis Results <INSERT static_results HERE> ## Scoring Results <INSERT score_results HERE> Follow all instructions in your agent definition. Produce your output in the Agent Communication Protocol format with scores for Safety and Cost Efficiency.

You are the completeness-evaluator agent for harness-eval. Evaluate the Actionability, Testability, and Contract-Based Testing dimensions for the project described below. ## Project Artifact (from collector) <INSERT project_artifact HERE> ## Static Analysis Results <INSERT static_results HERE> ## Scoring Results <INSERT score_results HERE> Follow all instructions in your agent definition. Produce your output in the Agent Communication Protocol format with scores for Actionability, Testability, and Contract-Based Testing.

You are the design-evaluator agent for harness-eval. Evaluate the Agent Communication, Context Management, Feedback Loop Maturity, and Evolvability dimensions for the project described below. ## Project Artifact (from collector) <INSERT project_artifact HERE> ## Static Analysis Results <INSERT static_results HERE> ## Scoring Results <INSERT score_results HERE> Follow all instructions in your agent definition. Produce your output in the Agent Communication Protocol format with scores for Agent Communication, Context Management, Feedback Loop Maturity, and Evolvability.

You are the synthesizer agent for harness-eval. Aggregate all evaluation data below into the final 12-dimension report. ## Static Analysis Results <INSERT static_results HERE> ## Scoring Results <INSERT score_results HERE> ## Project Artifact (from collector) <INSERT project_artifact HERE> ## Safety Evaluator Output <INSERT safety_eval_output HERE> ## Completeness Evaluator Output <INSERT completeness_eval_output HERE> ## Design Evaluator Output <INSERT design_eval_output HERE> Follow all instructions in your agent definition. Handle any AGENT_FAILED markers by setting those dimensions to null and noting them as missing. Produce the final report in BILINGUAL format (English first, then --- separator, then Korean). Tables, scores, and code are identical in both sections — only prose text differs. After the report, execute the history save and badge update commands.

# Harness Full Evaluation — Raw Results (Synthesizer Unavailable) **Date: <timestamp>** **Note: The synthesizer agent failed. These are the raw agent outputs without aggregation.** --- ## Standard Results ### Static Analysis <static_results> ### Scoring <score_results> --- ## Collector Artifact <project_artifact> --- ## Safety Evaluator <safety_eval_output> --- ## Completeness Evaluator <completeness_eval_output> --- ## Design Evaluator <design_eval_output> --- *To get a synthesized report, run `/harness-eval full` again.*

Failure Point	Behavior
static-analysis.sh fails	Log error, continue to scoring
scoring.sh fails	Log error, continue with available results
Collector agent fails	Abort Full mode, fall back to Standard report using available script results
Any evaluator agent fails	Mark its dimensions as null, continue to synthesis with partial data
All evaluator agents fail	Proceed to synthesis — synthesizer will report all dimensions as null
Synthesizer fails	Present raw agent outputs in fallback format
History save fails	Warn user, report still presented
Badge update fails	Warn user, report still presented

Failure Point

Behavior

static-analysis.sh fails

Log error, continue to scoring

scoring.sh fails

Log error, continue with available results

Collector agent fails

Abort Full mode, fall back to Standard report using available script results

Any evaluator agent fails

Mark its dimensions as null, continue to synthesis with partial data

All evaluator agents fail

Proceed to synthesis — synthesizer will report all dimensions as null

Synthesizer fails

Present raw agent outputs in fallback format

History save fails

Warn user, report still presented

Badge update fails

Warn user, report still presented

Phase 1 (Sequential): static-analysis.sh ──→ static_results scoring.sh ──→ score_results collector agent ──→ project_artifact │ ▼ Phase 2 (Parallel): ┌─────────────────┐ safety-evaluator │ All 3 receive: │ completeness-eval │ static_results │──→ 3 evaluator outputs design-evaluator │ score_results │ (or AGENT_FAILED markers) │ project_artifact│ └─────────────────┘ │ ▼ Phase 3 (Sequential): synthesizer ──→ final_report │ ▼ Phase 4: Present report + save history + update badge

Failure Point	Behavior
static-analysis.sh fails	Log error, continue to scoring
scoring.sh fails	Log error, continue with available results
Collector agent fails	Abort Full mode, fall back to Standard report using available script results
Any evaluator agent fails	Mark its dimensions as null, continue to synthesis with partial data
All evaluator agents fail	Proceed to synthesis — synthesizer will report all dimensions as null
Synthesizer fails	Present raw agent outputs in fallback format
History save fails	Warn user, report still presented
Badge update fails	Warn user, report still presented

Failure Point

Behavior

static-analysis.sh fails

Log error, continue to scoring

scoring.sh fails

Log error, continue with available results

Collector agent fails

Abort Full mode, fall back to Standard report using available script results

Any evaluator agent fails

Mark its dimensions as null, continue to synthesis with partial data

All evaluator agents fail

Proceed to synthesis — synthesizer will report all dimensions as null

Synthesizer fails

Present raw agent outputs in fallback format

History save fails

Warn user, report still presented

Badge update fails

Warn user, report still presented

full

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

full

Popularity

Invocation

Context Preview

SKILL.md

Phase 1: Collection (Sequential)

Step 1.1: Static Analysis

Step 1.2: Scoring

Step 1.3: Collector Agent

Step 1.3 Error Handling — CRITICAL

Phase 2: Parallel Evaluation

Agent Dispatch Instructions

2.1: Safety Evaluator

2.2: Completeness Evaluator

2.3: Design Evaluator

Phase 2 Error Handling

Phase 3: Synthesis

Step 3.1: Dispatch Synthesizer

Step 3.2: Synthesizer Error Handling

Phase 4: Present Results and Save Reports

Error Handling Summary

Data Flow Diagram

Tone

Language

Similar Skills

Help us improve

Phase 1: Collection (Sequential)

Step 1.1: Static Analysis

Step 1.2: Scoring

Step 1.3: Collector Agent

Step 1.3 Error Handling — CRITICAL

Phase 2: Parallel Evaluation

Agent Dispatch Instructions

2.1: Safety Evaluator

2.2: Completeness Evaluator

2.3: Design Evaluator

Phase 2 Error Handling

Phase 3: Synthesis

Step 3.1: Dispatch Synthesizer

Step 3.2: Synthesizer Error Handling

Phase 4: Present Results and Save Reports

Error Handling Summary

Data Flow Diagram

Tone

Language