Help us improve
Share bugs, ideas, or general feedback.
From claude-code-config
Scores a project's agent harness across 5 subsystems, identifies the bottleneck, and outputs a prioritized improvement plan. Use to assess readiness for long-run agent sessions or when adopting an agent stack on a new codebase.
npx claudepluginhub anastasiyaw/claude-code-configHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-code-config:harness-auditWhen to use
Trigger on phrases like: "audit my harness", "evaluate my agent setup", "score my CLAUDE.md", "is my project ready for long-run", "5-subsystem assessment", "what's missing from my project setup", "/harness-audit". Run proactively when joining an unfamiliar codebase that has agent artifacts (CLAUDE.md, .claude/, AGENTS.md) but obvious gaps. Skip for single-file scripts and pure exploration.
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Score a project's agent harness across five subsystems and tell the user which one to fix first.
Diagnoses agent harness maturity against 12 principles via file checks and greps, scoring levels (L1-L5) and suggesting setup, audit, or maintenance improvements without modifications.
Audits Claude Code harness maturity using 6-axis 24-item checklist and 2x3 matrix (Static/Behavioral/Growth × User/Project), running 4 sub-agents for skill portfolio, sessions, context, and automation. Outputs scorecards, action reports, HTML/MD files.
Assesses codebase for AI agent readiness by detecting stacks, monorepos, git setup, and evaluating style, testing, code quality, secrets, and file sizes.
Share bugs, ideas, or general feedback.
Score a project's agent harness across five subsystems and tell the user which one to fix first.
Source: Five-subsystem framework adapted from Learn Harness Engineering (walkinglabs, MIT). Adapted to our concrete stack: CLAUDE.md, .claude/rules/, PROBLEMS.md, feature_list.json, init.sh, hooks, handoffs, chronicles.
Given a project directory, produces a scorecard like this:
=== Harness Audit: project-xyz ===
Instructions 4/5 ✓ CLAUDE.md present, modular rules in .claude/rules/
✗ No project-level REVIEW.md for PR review guidance
State 2/5 ✓ .claude/handoffs/ exists (3 files)
✗ No PROBLEMS.md - issues scattered in handoffs
✗ No feature_list.json - scope state not machine-readable
Verification 3/5 ✓ Tests run, pytest configured
✗ No init.sh - new sessions take 15+ min to bootstrap
✗ 3-layer gate not documented in CLAUDE.md
Scope 3/5 ✓ no-pre-existing-evasion principle in CLAUDE.md
✗ No WIP=1 (no feature_list.json to enforce it)
✗ Definition of Done not explicit
Lifecycle 2/5 ✗ No SessionStart hook (no .claude/settings.json)
✗ No Stop hook for clean-state check
~ Manual cleanup convention exists but not enforced
Bottleneck: State (2/5) — lack of structured progress tracking
Top 3 improvements (in order):
1. Create PROBLEMS.md (1h) ↗ State 2→4
Template: claude-code-skills/templates/long-run-project/ has examples
2. Create feature_list.json + init.sh (30min) ↗ State 2→5, Verification 3→4
Drop-in: claude-code-skills/templates/long-run-project/
3. Add Stop hook stop-test-gate.py (15min) ↗ Lifecycle 2→4
Source: claude-code-skills/hooks/stop-test-gate.py
After top 3: Instructions 4 + State 5 + Verification 4 + Scope 3 + Lifecycle 4 = 20/25 (was 14/25)
The skill does not make changes. It produces the scorecard. The user decides whether to apply recommendations.
| Subsystem | Concrete files/conventions in our stack |
|---|---|
| Instructions | CLAUDE.md (root + ~/.claude/), .claude/rules/*.md (project), ~/.claude/rules/*.md (global), optional REVIEW.md |
| State | PROBLEMS.md, feature_list.json, .claude/handoffs/, .claude/chronicles/ |
| Verification | init.sh, tests configured, 3-Layer Validation Gate referenced in CLAUDE.md, Proof Loop usage |
| Scope | no-pre-existing-evasion.md rule applied, WIP=1 enforced (one in-progress in feature_list.json), explicit Definition of Done |
| Lifecycle | SessionStart hooks, Stop hooks (stop-test-gate, check-problems-md), cleanup convention |
See references/checklist-per-subsystem.md for per-subsystem concrete checks.
See references/scoring-rubric.md for how to interpret 1-5 scores.
Read these files in order (skip silently if missing):
CLAUDE.md in project rootAGENTS.md in project root (some projects use this name).claude/rules/*.md (project-level rules).claude/settings.json and .claude/settings.local.json (hooks config)PROBLEMS.md in rootfeature_list.json in rootinit.sh in root (and Makefile / package.json scripts as fallback).claude/handoffs/ (count files, check INDEX.md existence).claude/chronicles/ (count files)pytest.ini / package.json test script / Cargo.tomlUse Glob + Read. Don't grep across entire codebase — this is metadata audit, not code review.
For each subsystem, run the checks in references/checklist-per-subsystem.md. Each check is a binary pass/fail. Score:
For each subsystem, list:
The lowest-scoring subsystem is the bottleneck. Even if other subsystems are weaker by absolute count of checks, the lowest score is the one to fix first because it limits the value of the rest.
Tie-breaker (multiple subsystems at same low score): pick the one whose improvement unlocks progress in others. State usually wins ties because feature_list.json + PROBLEMS.md unlock Verification and Scope checks.
Output exactly 3 next steps in order, each with:
claude-code-skills/ if availableThe 3 steps must:
Do not give more than 3. Three is enough scope for one focused session.
Use the visual scorecard format shown at the top of this skill. Sections:
=== Harness Audit: <project-name> === (one line)Keep the entire output under 50 lines. The user is scanning for next steps, not reading an essay. Detail goes into the per-subsystem checklist file, not the audit output.
/security-review instead)init.sh or tests, just checks existence.claude/), translate concepts before scoring — don't fail the project on naming.templates/long-run-project/ — drop-in files for fixing State + Verification gapsrules/long-run-harness.md — convention this audit checks againstThis skill is itself a [LONG-RUN]-style artifact. To audit the audit:
references/scoring-rubric.md (✓)references/example-audits.md (TODO if added)