Skill

standard

Evaluates Claude Code harnesses with static analysis, dynamic hook testing, secret scanning, checklists, and scoring. Generates bilingual report with findings, score, and improvement roadmap in 2-3 minutes.

Bash

code-quality

testing

Install

npx claudepluginhub whchoi98/harness-eval --plugin harness-eval

Tool Access

This skill uses the workspace's default tool permissions.

Preview

You are performing a Standard harness evaluation. This combines static analysis, dynamic testing, and checklist scoring for a comprehensive assessment.

SKILL.md

Similar Skills

applying-brand-guidelines

3 files

Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.

anthropics-claude-cookbooks

41.6k

creating-financial-models

2 files

Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.

anthropics-claude-cookbooks

41.6k

analyzing-financial-statements

2 files

Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.

anthropics-claude-cookbooks

41.6k

Stats

Parent Repo Stars11

Parent Repo Forks1

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

# True positive — should be detected echo "AKIAIOSFODNN7EXAMPLE" | bash <secret-hook> # True positive — AWS secret key pattern echo "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" | bash <secret-hook> # False positive — should NOT trigger echo "normal-base64-string-that-is-not-a-key" | bash <secret-hook>

# Harness Standard Evaluation **Score: {overall}/10 ({grade})** **Date: {timestamp}** ## Static Analysis Summary | Category | Pass | Warn | Fail | |----------|------|------|------| | Correctness | X | Y | Z | | Safety | X | Y | Z | | Completeness | X | Y | Z | | Consistency | X | Y | Z | ## Static Analysis Findings (List each WARN and FAIL with details, file path, and suggestion) ## Dynamic Analysis Results ### Hook Execution (Results of hook testing — which hooks passed/failed) ### Secret Pattern Accuracy (TP/FP results if applicable) ### Test Suite Results (Results of running existing tests, or "No test suite found") ## Checklist Results | Tier | Passed | Total | Status | |------|--------|-------|--------| | Basic (6.0+) | X | Y | ✓/✗ | | Functional (7.0+) | X | Y | ✓/✗ | | Robust (8.0+) | X | Y | ✓/✗ | | Production (9.0+) | X | Y | ✓/✗ | ## Improvement Roadmap (Priority-ordered list of 5-10 specific improvements) --- # 하네스 Standard 평가 **점수: {overall}/10 ({grade})** **날짜: {timestamp}** ## 정적 분석 요약 | 카테고리 | 통과 | 경고 | 실패 | |----------|------|------|------| | 정확성 | X | Y | Z | | 안전성 | X | Y | Z | | 완전성 | X | Y | Z | | 일관성 | X | Y | Z | ## 정적 분석 발견 사항 (각 WARN 및 FAIL 항목의 상세 내용, 파일 경로, 개선 제안) ## 동적 분석 결과 ### 훅 실행 (훅 테스트 결과 — 통과/실패 항목) ### 시크릿 패턴 정확도 (해당되는 경우 TP/FP 결과) ### 테스트 스위트 결과 (기존 테스트 실행 결과, 또는 "테스트 스위트 없음") ## 체크리스트 결과 | 단계 | 통과 | 전체 | 상태 | |------|------|------|------| | 기본 (6.0+) | X | Y | ✓/✗ | | 기능적 (7.0+) | X | Y | ✓/✗ | | 견고 (8.0+) | X | Y | ✓/✗ | | 프로덕션 (9.0+) | X | Y | ✓/✗ | ## 개선 로드맵 (영향도 순으로 정렬된 5-10개 구체적 개선 사항)

standard

Install

Tool Access

Preview

SKILL.md

Similar Skills

standard

Install

Tool Access

Preview

SKILL.md

Phase 1: Static Analysis

Phase 2: Dynamic Analysis

2a. Hook Execution Testing

2b. Secret Pattern Testing (if secret scanning hook exists)

2c. Existing Test Suite

Phase 3: Scoring

Phase 4: Report Generation

Phase 5: Save Reports to Files

Phase 6: Save History

Error Handling

Tone

Language

Similar Skills

Phase 1: Static Analysis

Phase 2: Dynamic Analysis

2a. Hook Execution Testing

2b. Secret Pattern Testing (if secret scanning hook exists)

2c. Existing Test Suite

Phase 3: Scoring

Phase 4: Report Generation

Phase 5: Save Reports to Files

Phase 6: Save History

Error Handling

Tone

Language