Help us improve
Share bugs, ideas, or general feedback.
From claude-commands
Analyzes failures and fixes the harness (instructions, skills, tests, CI) rather than just the symptom. Runs 5 Whys analysis on technical and agent-path causes, identifies gaps, and proposes or implements fixes.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsHow this command is triggered — by the user, by Claude, or both
Slash command
/claude-commands:harnesscommands/The summary Claude sees in its command listing — used to decide when to auto-load this command
# /harness — Fix the harness, not just the symptom **Scope:** **Project-level override.** This file is the repo-local copy at `.claude/commands/harness.md`. The user-level canonical copy lives at `~/.claude/commands/harness.md` and applies to **any** repository. **Collision rule:** when both exist, this repo-local file takes precedence and may add project-specific harness rules (for example OpenClaw gateway). **Canonical copy in git:** [jleechanclaw `docs/harness/user-command-harness.md`](https://github.com/jleechanorg/jleechanclaw/blob/main/docs/harness/user-command-harness.md) (sync with...
/fix-issueFixes GitHub issue by number using parallel root cause analysis, hypothesis testing, similar issue detection, fixes, tests, and prevention recommendations.
/auditAssesses the current coding session for concrete failures like wrong paths or incorrect assumptions, then enumerates them with root causes and proposes prioritized fixes via hooks, rules, or CLAUDE.md updates.
/debugRuns structured debugging with parallel agents: traces bugs via phases, diagnoses root causes with evidence, proposes minimal fixes, executes after approval. Tracks in tasks/todo.md.
/harness-auditAudits the repository's agent harness setup across tool coverage, context efficiency, quality gates, and more, producing an overall score out of 70, category breakdowns, findings, top 3 actions with file paths, and suggested ECC skills. Supports optional scopes and JSON format.
/retroAnalyzes session history for manual fixes missed by skills and checkpoints, maps gaps, checks freshness, and creates PRs to fix skill repos.
Share bugs, ideas, or general feedback.
Scope: Project-level override. This file is the repo-local copy at .claude/commands/harness.md. The user-level canonical copy lives at ~/.claude/commands/harness.md and applies to any repository. Collision rule: when both exist, this repo-local file takes precedence and may add project-specific harness rules (for example OpenClaw gateway). Canonical copy in git: jleechanclaw docs/harness/user-command-harness.md (sync with scripts/sync-harness-user-scope.sh in that repo).
When this command is invoked, analyze the current situation for harness-level gaps and propose/implement fixes that prevent the same class of mistake from recurring.
Skill reference: ~/.claude/skills/harness-engineering/SKILL.md
/harness — Analyze the most recent mistake or user correction in this conversation and propose harness fixes/harness <description> — Analyze a specific failure pattern and propose harness fixes/harness --fix — Analyze AND implement fixes without waiting for approval/harness --audit — Scan all instruction files for staleness, contradictions, or gaps/harness or /harness <description>)~/.claude/CLAUDE.md (global instructions)CLAUDE.md (project instructions).claude/commands/harness.md and .claude/skills/*/ when present (repository overlay)~/.codex/AGENTS.md (Codex instructions)~/.claude/skills/harness-engineering/SKILL.md (the skill itself)~/.claude/projects/*/memory/ (relevant memories)--fix flag)/harness --fix)Same as default but implement immediately after analysis. Still report what was changed.
/harness --audit)Scan all harness files for:
Report findings as a table:
| Issue | File | Line | Recommendation |
|-------|------|------|----------------|
| Stale | ~/.claude/CLAUDE.md | 42 | Remove reference to deprecated tool X |
| Gap | repo CLAUDE.md | - | Add rule about Y (corrected 3x in memory) |
## Harness Analysis
**Trigger**: <what happened — user correction, failed test, or description>
**Failure class**: <mislabeled artifact | wrong approach | missing validation | repeated manual fix | silent degradation | knowledge gap | LLM path error>
### 5 Whys — Technical failure
1. Why: <answer>
2. Why: <answer>
3. Why: <answer>
4. Why: <answer>
5. Why: <answer>
→ Root cause: <single sentence>
### 5 Whys — Agent path
1. Why did the agent not catch/prevent this?
2. Why did the agent reason or act that way?
3. Why didn't the agent's instructions prevent that reasoning?
4. Why wasn't there a skill, memory, or rule that redirected the agent?
5. Why was the harness incomplete for this class of agent behavior?
→ Agent root cause: <single sentence>
### Existing coverage
- [x] ~/.claude/CLAUDE.md — <relevant rule if exists>
- [ ] repo CLAUDE.md — <gap identified>
- [ ] ~/.claude/skills/ — <no skill covers this>
- [x] memory — <relevant memory if exists>
### Proposed fixes
1. **[Instructions]** `<file>` — <what to add/change>
2. **[Skill]** `<file>` — <what to create/update>
3. **[Test]** `<file>` — <what test to add>
### Verification
<How to confirm the fix works>
User says "don't mock the database in these tests": → Failure class: wrong approach → 5 Whys technical: mock used → no instructions prohibiting it → testing philosophy not documented → ... → 5 Whys agent: agent defaulted to mock → common pattern in training data → no skill redirecting to real tests → ... → Add instruction to CLAUDE.md, save feedback memory
Test labeled "e2e" but only does unit-level work: → Failure class: mislabeled artifact → 5 Whys technical: E2E criteria not met → criteria not checked → no checklist for E2E → ... → 5 Whys agent: agent named it e2e without verifying → no skill mandating verification → ... → Add/update test classification rules in CLAUDE.md + AGENTS.md, update /validate-e2e skill
Same code review comment given 3 conversations in a row: → Failure class: repeated manual fix → mandatory harness fix, no exceptions → Add instruction to CLAUDE.md, save memory, consider lint rule
Automation cleanup silently fails every cycle: → Failure class: silent degradation (harness layer present but broken) → 5 Whys technical: cleanup fn uses wrong grep key → porcelain format not verified → no test for cleanup path → ... → 5 Whys agent: agent said "cleanup present" without running it → assumed present = working → skill doesn't mandate verifying harness script correctness → ... → Fix script, add verification step to skill, add integration test for cleanup path
AO worker spawned on original PR branch instead of isolated clone:
→ Failure class: LLM path error — wrong abstraction level (agent acted at "spawn worker" level without verifying branch isolation)
→ 5 Whys technical: ao spawn reuses existing worktrees → --claim-pr only adds dashboard tracking → no clone created → worker lands on original branch → pushes commits directly to live PR
→ 5 Whys agent: request said "spawn for PR 6198" → agent assumed ao spawn would create isolation → flag name implies PR association but not branch isolation → no skill/instruction to redirect to clone-before-spawn → harness had no rule for this failure class
→ Fix: add clone-before-spawn rule to jleechanclaw CLAUDE.md, add verification to team-mini.md, add failure class to harness.md, save feedback memory
General principle — tool semantic mismatch: Many tools have names that imply capabilities they don't actually provide. When a tool name suggests isolation, atomicity, or safety guarantees that the implementation doesn't enforce, agents will trust the name and skip verification. This is a recurring failure class: LLM path error driven by misleading tool semantics.
Common examples:
ao spawn --claim-pr implies PR isolation — it provides only dashboard trackinggit checkout -b implies a new branch — but it can be from HEAD, not an isolated PR clonegh pr checkout checks out the PR branch directly — not a cloneRule: When a tool's name semantically promises isolation or safety guarantees, verify those guarantees exist in the implementation before relying on them. If the tool's name over-promises relative to its behavior, the harness gap is the misleading name — fix in docs/skill rather than expecting agents to discover the gap.
$ARGUMENTS