Help us improve
Share bugs, ideas, or general feedback.
From harness-eval
Evaluates Claude Code harnesses with static analysis, dynamic hook testing, secret scanning, checklists, and scoring. Generates bilingual report with findings, score, and improvement roadmap in 2-3 minutes.
npx claudepluginhub whchoi98/harness-eval --plugin harness-evalHow this skill is triggered — by the user, by Claude, or both
Slash command
/harness-eval:standardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are performing a Standard harness evaluation. This combines static analysis, dynamic testing, and checklist scoring for a comprehensive assessment.
Runs quick checklist-based harness evaluation on project: scores 1-10, grades, suggests improvements, generates bilingual EN/KO Markdown reports saved to .harness-eval/reports.
Diagnoses Claude Code harness health (hooks, skills, agents, rules, MCP, eval) across 8 dimensions, scores 0-24 with S-D grades, and provides improvement suggestions. Scans ~/.claude/. Triggers: harness audit, 하네스 진단.
Audits Claude Code harness maturity using 6-axis 24-item checklist and 2x3 matrix (Static/Behavioral/Growth × User/Project), running 4 sub-agents for skill portfolio, sessions, context, and automation. Outputs scorecards, action reports, HTML/MD files.
Share bugs, ideas, or general feedback.
You are performing a Standard harness evaluation. This combines static analysis, dynamic testing, and checklist scoring for a comprehensive assessment.
Run the static analysis script:
HARNESS_EVAL_ROOT="${CLAUDE_PLUGIN_ROOT}" bash "${CLAUDE_PLUGIN_ROOT}/scripts/static-analysis.sh" "$(pwd)"
Capture the JSON output. This checks:
Perform these live tests using Bash tool calls:
For each hook script found in .claude/hooks/:
echo "" | bash <hook> — should not crashecho '{"tool":"Bash","input":"ls"}' | bash <hook> — should produce outputTest true positives and false positives:
# True positive — should be detected
echo "AKIAIOSFODNN7EXAMPLE" | bash <secret-hook>
# True positive — AWS secret key pattern
echo "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" | bash <secret-hook>
# False positive — should NOT trigger
echo "normal-base64-string-that-is-not-a-key" | bash <secret-hook>
Look for test files in tests/ directory. If found, run them:
# Look for test runner
ls tests/run-all.sh tests/*.sh 2>/dev/null
# Run discovered tests
bash tests/run-all.sh # or individual test files
Run the checklist scoring:
HARNESS_EVAL_ROOT="${CLAUDE_PLUGIN_ROOT}" bash "${CLAUDE_PLUGIN_ROOT}/scripts/scoring.sh" --mode standard "$(pwd)"
Combine all results into a bilingual (English + Korean) report. English section first, then ---, then Korean section. Tables, scores, and code are identical — only prose differs.
# Harness Standard Evaluation
**Score: {overall}/10 ({grade})**
**Date: {timestamp}**
## Static Analysis Summary
| Category | Pass | Warn | Fail |
|----------|------|------|------|
| Correctness | X | Y | Z |
| Safety | X | Y | Z |
| Completeness | X | Y | Z |
| Consistency | X | Y | Z |
## Static Analysis Findings
(List each WARN and FAIL with details, file path, and suggestion)
## Dynamic Analysis Results
### Hook Execution
(Results of hook testing — which hooks passed/failed)
### Secret Pattern Accuracy
(TP/FP results if applicable)
### Test Suite Results
(Results of running existing tests, or "No test suite found")
## Checklist Results
| Tier | Passed | Total | Status |
|------|--------|-------|--------|
| Basic (6.0+) | X | Y | ✓/✗ |
| Functional (7.0+) | X | Y | ✓/✗ |
| Robust (8.0+) | X | Y | ✓/✗ |
| Production (9.0+) | X | Y | ✓/✗ |
## Improvement Roadmap
(Priority-ordered list of 5-10 specific improvements)
---
# 하네스 Standard 평가
**점수: {overall}/10 ({grade})**
**날짜: {timestamp}**
## 정적 분석 요약
| 카테고리 | 통과 | 경고 | 실패 |
|----------|------|------|------|
| 정확성 | X | Y | Z |
| 안전성 | X | Y | Z |
| 완전성 | X | Y | Z |
| 일관성 | X | Y | Z |
## 정적 분석 발견 사항
(각 WARN 및 FAIL 항목의 상세 내용, 파일 경로, 개선 제안)
## 동적 분석 결과
### 훅 실행
(훅 테스트 결과 — 통과/실패 항목)
### 시크릿 패턴 정확도
(해당되는 경우 TP/FP 결과)
### 테스트 스위트 결과
(기존 테스트 실행 결과, 또는 "테스트 스위트 없음")
## 체크리스트 결과
| 단계 | 통과 | 전체 | 상태 |
|------|------|------|------|
| 기본 (6.0+) | X | Y | ✓/✗ |
| 기능적 (7.0+) | X | Y | ✓/✗ |
| 견고 (8.0+) | X | Y | ✓/✗ |
| 프로덕션 (9.0+) | X | Y | ✓/✗ |
## 개선 로드맵
(영향도 순으로 정렬된 5-10개 구체적 개선 사항)
Save the English and Korean reports as separate files in the target project:
mkdir -p .harness-eval/reports
.harness-eval/reports/eval-{YYYY-MM-DD}-{NNN}-standard-en.md.harness-eval/reports/eval-{YYYY-MM-DD}-{NNN}-standard-ko.mdUse the Write tool to create each file. The {NNN} sequence number should match the evaluation ID from history.
Save the scoring result to history:
echo '<scoring-json>' | HARNESS_EVAL_ROOT="${CLAUDE_PLUGIN_ROOT}" bash "${CLAUDE_PLUGIN_ROOT}/scripts/history.sh" "$(pwd)" save
Report the evaluation ID and saved report file paths to the user.
.claude/ directory exists: note very low score expected, guide user to set up basicsBe thorough but constructive. For each issue found, provide a specific fix. Prioritize the improvement roadmap by impact.
Always produce the report in both English and Korean. English section first, then a horizontal rule (---), then the Korean section. Tables, scores, file paths, and code blocks are identical in both sections — only the prose text differs.