From harness-engineer
Analyze agent failures and automatically generate harness fixes to prevent recurrence. Use whenever an agent session went wrong, produced broken output, got stuck in a loop, failed to complete features, or the codebase is in a bad state after an agent run. Reads git log, progress files, circuit breaker logs, and test output to diagnose the failure mode, then generates a targeted harness patch (AGENTS.md update, features.json fix, new hook, or architectural constraint) that prevents the same failure from happening again. Based on LangChain's Trace Analyzer and OpenAI's "failure = harness signal" principle. Trigger on: "agent failed", "session went wrong", "agent got stuck", "loop broke", "agent made a mess", "why did it fail", "harness doctor", "fix my harness".
npx claudepluginhub lauraflorentin/skills-marketplace --plugin harness-engineerThis skill uses the workspace's default tool permissions.
Diagnoses agent session failures and generates targeted harness patches.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Diagnoses agent session failures and generates targeted harness patches. The core principle (OpenAI): every agent failure is a harness gap. Don't just fix the code — fix the system so the agent never makes the same mistake again.
| Code | Name | Symptoms |
|---|---|---|
DOOM_LOOP | Same file edited 5+ times | Circuit breaker log, high edit count |
PREMATURE_EXIT | Agent declared done too early | Features marked passing that fail manually |
ONE_SHOT | Tried to build everything at once | Single large commit, context exhaustion |
DIRTY_STASH | Dead Man's Switch fired | Stash entry in git stash list |
BROKEN_FOUNDATION | Built on broken dev server | Watchdog log showing server down |
LAYER_VIOLATION | Imported across layer boundary | layers.json check failure |
CONTEXT_BLIND | Agent didn't read progress/features | Missing reads in tool call history |
ENVIRONMENT_BLIND | Agent didn't run init.sh | No smoke test in session log |
Run these commands to gather diagnostic data:
# Git history of last session
git log --oneline -20
# Circuit breaker log
cat .harness/state/circuit-breaker.log 2>/dev/null || echo "No circuit breaker log"
# Dead man's switch log
cat .harness/state/deadmans-switch.log 2>/dev/null || echo "No DMS log"
# Watchdog log
cat .harness/state/watchdog.log 2>/dev/null || echo "No watchdog log"
# Break audit trail
cat .harness/state/breaks.log 2>/dev/null || echo "No breaks logged"
# Stash list (Dead Man's Switch entries)
git stash list | grep harness || echo "No harness stashes"
# Progress file
cat claude-progress.txt 2>/dev/null | tail -30
# Feature status summary
python3 -c "
import json
with open('features.json') as f:
data = json.load(f)
features = data if isinstance(data, list) else data.get('features', [])
total = len(features)
passing = sum(1 for f in features if f.get('passes') == True)
broken = sum(1 for f in features if f.get('circuit_broken') == True)
in_prog = sum(1 for f in features if f.get('in_progress') == True)
print(f'Features: {total} total, {passing} passing, {broken} circuit-broken, {in_prog} in-progress')
" 2>/dev/null
# Current dirty state
git status --short
Based on evidence, identify the primary failure mode(s) from the taxonomy above. Multiple modes can co-occur (e.g., DOOM_LOOP + DIRTY_STASH).
For each identified failure mode, generate the appropriate fix:
# In .claude/settings.json harness section:
# Lower SOFT_THRESHOLD from 5 → 3 for this project
# Add to AGENTS.md operating principles:
echo "9. Before editing a file for the 3rd time, write down WHY this attempt will succeed where the previous ones failed." >> AGENTS.md
Add to AGENTS.md:
BEFORE marking any feature passes=true, you MUST:
1. Run init.sh fresh (kill and restart the dev server)
2. Navigate to the feature as a user would
3. Test at least 2 edge cases, not just the happy path
4. Check browser console for errors
Passing your own unit test is NOT sufficient.
Add to AGENTS.md:
HARD CONSTRAINT: You may work on exactly ONE feature per session.
After completing and committing one feature, start a new session.
Do not implement more than one features.json item before committing.
Add to AGENTS.md:
COMMIT PROTOCOL: After completing ANY work unit (even partial):
git add -A && git commit -m "wip: [description of what's done"
If you haven't committed in 15 minutes, something is wrong. Stop and assess.
Add to AGENTS.md:
SESSION START IS NON-NEGOTIABLE:
bash init.sh
If init.sh fails, STOP. Do not write a single line of code until it passes.
The app must be in a known working state before you touch it.
# Add to .claude/settings.json PostToolUse hooks:
# Run layer check after every Write
bash .harness/scripts/check-layers.sh
Strengthen harness-onboard for this project (see harness-onboard skill).
{
"SessionStart": [{
"matcher": "*",
"hooks": [{"type": "command", "command": "bash init.sh"}]
}]
}
Apply all generated patches. Always apply by having Claude write the fix — never manually.
# After applying patches, commit them
git add AGENTS.md .claude/settings.json .harness/
git commit -m "harness-doctor: fix [FAILURE_MODE] — [one line description]"
Append to claude-progress.txt:
=== HARNESS DOCTOR REPORT [timestamp] ===
Failure modes detected: [list]
Root causes:
- [cause 1]
- [cause 2]
Patches applied:
- [patch 1]: [what changed and why]
- [patch 2]: [what changed and why]
Prevention: [what the harness now does differently]
Next session: [recommended starting point]
===========================================
If the repo is in a dirty state:
# Option A: stash exists with good work
git stash list # find the relevant harness stash
git stash show -p stash@{0} # inspect it
# Cherry-pick what's good, discard the rest
# Option B: everything is broken, start clean
git checkout -- . # discard all uncommitted changes
# Find last known good commit
git log --oneline | head -20
# Reset if needed
git reset --hard [good-commit-sha]
Tell the user:
bash init.sh to verify, then resume)