From harness-claude
Runs 7-phase code review pipeline with gatekeeping, mechanical checks (lint/typecheck/tests/security), graph-scoped context, parallel subagents, validation, deduplication, and structured technical output. Use for PRs or completed work.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Multi-phase code review pipeline — mechanical checks, graph-scoped context, parallel review agents, cross-agent deduplication, and structured output with technical rigor over social performance.
Dispatches 5 specialized agents for multi-perspective code review on correctness, architecture, security, production readiness, and test quality. Merges findings, auto-fixes Critical/Important issues up to 3 rounds.
Performs structured code reviews on git branches or PRs using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use before creating PRs or for feedback on changes.
Assesses code across six tenants (architecture, simplicity, maintainability, correctness, test coverage, documentation) using four parallel agents; triages findings and applies auto/manual fixes in git projects.
Share bugs, ideas, or general feedback.
Multi-phase code review pipeline — mechanical checks, graph-scoped context, parallel review agents, cross-agent deduplication, and structured output with technical rigor over social performance.
on_pr / on_review)When invoked by autopilot (or with explicit arguments), resolve paths before starting:
session-slug argument provided, set {sessionDir} = .harness/sessions/<session-slug>/. Pass to gather_context({ session: "<session-slug>" }). All handoff writes go to {sessionDir}/handoff.json.commit-range argument provided (e.g., abc123..HEAD), use as diff scope in Phase 2 MECHANICAL and Phase 7 OUTPUT. Otherwise, auto-detect from git environment.When no arguments are provided (standalone invocation), session slug is unknown — omit from gather_context, fall back to global .harness/ paths. Diff scope auto-detected.
Review identifies issues. Review never fixes them.
A reviewer who applies fixes is no longer reviewing — they are editing with reviewer authority and no review. Suggest the fix in the finding. Do not apply it. If you catch yourself writing production code during review, STOP. You have crossed the boundary.
The review runs as a 7-phase pipeline. Each phase has a clear input, output, and exit condition.
Phase 1: GATE --> Phase 2: MECHANICAL --> Phase 3: CONTEXT --> Phase 4: FAN-OUT
|
Phase 7: OUTPUT <-- Phase 6: DEDUP+MERGE <-- Phase 5: VALIDATE <------+
| Phase | Tier | Purpose | Exit Condition |
|---|---|---|---|
| 1. GATE | fast | Skip ineligible PRs (CI only) | PR eligible, or exit with reason |
| 2. MECHANICAL | none | Lint, typecheck, test, sec scan | All pass -> continue; any fail -> report and stop |
| 3. CONTEXT | fast | Scope context per review domain | Context bundles assembled for each subagent |
| 4. FAN-OUT | mixed | Parallel review subagents | All subagents return ReviewFinding[] |
| 5. VALIDATE | none | Exclude mechanical dupes, verify | Unvalidated findings discarded |
| 6. DEDUP+MERGE | none | Group, merge, assign severity | Deduplicated finding list with merged evidence |
| 7. OUTPUT | none | Text output or GitHub comments | Review delivered, exit code set |
interface ReviewFinding {
id: string; // unique, for dedup
file: string; // file path
lineRange: [number, number]; // start, end
domain: 'compliance' | 'bug' | 'security' | 'architecture';
severity: 'critical' | 'important' | 'suggestion';
title: string; // one-line summary
rationale: string; // why this is an issue
suggestion?: string; // fix, if available
evidence: string[]; // supporting context from agent
validatedBy: 'mechanical' | 'graph' | 'heuristic';
}
| Flag | Effect |
|---|---|
--comment | Post inline comments to GitHub PR via gh CLI or GitHub MCP |
--deep | Pass --deep to harness-security-review for threat modeling |
--no-mechanical | Skip mechanical checks (useful if already run in CI) |
--ci | Enable eligibility gate, non-interactive output |
--fast | Skip learnings, fast-tier agents for all fan-out slots |
--thorough | Always load learnings, full roster + meta-judge, learnings in output |
Set via --fast or --thorough flags (or passed by autopilot). Default is standard.
| Phase | fast | standard (default) | thorough |
|---|---|---|---|
| 3. CONTEXT | Skip learnings entirely | Load learnings if file exists; score via filterByRelevance | Always load learnings; fail loudly if missing |
| 4. FAN-OUT | All agents at fast tier. Reduced focus areas. | Default tier assignments | Full roster + meta-judge confirms findings cited by multiple agents, flags contradictions, and surfaces cross-cutting concerns |
| 7. OUTPUT | Standard format | Standard format | Include "Learnings Applied" section with relevance scores |
Tiers are abstract labels resolved at runtime from project config. If no config exists, all phases use the current model.
| Tier | Default | Used By |
|---|---|---|
fast | haiku-class | GATE, CONTEXT |
standard | sonnet-class | Compliance agent, Architecture agent |
strong | opus-class | Bug Detection agent, Security agent |
Before starting the pipeline, check for a project-specific calibration file. Behavior by rigor:
fast: Skip entirely. Do not read or score learnings.standard: Read if file exists, score and filter. If missing, use default focus areas.thorough: Always read. If .harness/review-learnings.md missing, log warning.If .harness/review-learnings.md exists (and rigor is not fast):
When learnings are loaded (standard or thorough), score against diff context before applying:
filterByRelevance(learnings, diffContext, 0.7, 1000) from packages/core/src/state/learnings-relevance.ts. Only learnings >= 0.7 retained, sorted by score, truncated to 1000-token budget.After review, consider suggesting .harness/review-learnings.md creation if you notice patterns that would benefit from calibration.
Generate a self-review checklist via create_self_review to establish baseline expectations before deeper analysis. This checklist feeds into Phase 4 subagents as an additional input.
Tier: fast | Mode: CI only (--ci). Skip when invoked manually.
Checks whether the PR should be reviewed at all, preventing wasted compute in CI.
Checks:
.md? -> Skip: "Documentation-only change."gh pr view --json state,isDraft,files
gh pr diff --name-only | grep -v '\.md$' | wc -l # 0 means docs-only
Exit: If any check triggers skip, output reason and exit 0. Otherwise continue to Phase 2.
Tier: none (no LLM) | Mode: Skipped with --no-mechanical.
Run mechanical checks to establish an exclusion boundary. Issues caught here are excluded from AI review (Phase 4).
Checks:
assess_project({ path, checks: ["validate", "deps", "docs"], mode: "detailed" }) — runs checks in parallel.analyze_diff to get a structured breakdown of changes by file, category, and risk level.run_security_scan on changed files. Record findings with rule ID, file, line.tsc --noEmit. Record errors.Output: Set of mechanical findings (file, line, tool, message) forming the exclusion list for Phase 5.
When sessionSlug is available, load evidence entries from session state and cross-reference with findings:
readSessionSection(projectRoot, sessionSlug, 'evidence')[UNVERIFIED]When no session is available, skip silently. Evidence checking enhances but does not gate reviews.
Exit: If harness validate, typecheck, or tests fail, report in Strengths/Issues/Assessment format and stop. Lint warnings and security findings do not stop the pipeline -- recorded for exclusion only.
Tier: fast | Purpose: Assemble scoped context bundles per review domain. Each subagent receives only domain-relevant context.
Determine change type to shape review focus:
feat: -> feature, fix: -> bugfix, refactor: -> refactor, docs: -> docs.md -> docs| Domain | With Graph | Without Graph (Fallback) |
|---|---|---|
| Compliance | Convention files + changed files | Same (no graph needed) |
| Bug Detection | Changed files + dependencies via query_graph | Changed files + imported files (grep import) |
| Security | Security paths + data flow via query_graph | Changed files + files with auth/crypto/SQL/shell patterns |
| Architecture | Layer boundaries + imports via query_graph/get_impact | Changed files + harness check-deps output |
Run compute_blast_radius on changed files to identify downstream modules that may be affected and should be included in context bundles.
When .harness/graph/ exists, use gather_context for efficient assembly:
gather_context({
path: "<project-root>",
intent: "Code review of <change description>",
skill: "harness-code-review",
session: "<session-slug-if-provided>",
tokenBudget: 8000,
include: ["graph", "learnings", "validation"]
})
Replaces manual query_graph + get_impact + find_context_for calls. Falls back gracefully when no graph exists. Supplement with targeted query_graph calls for domain-specific scoping.
git diff --stat HEAD~1 # measure diff size
git diff HEAD~1 -- <file> # per-file diff
grep -n "import\|require\|from " <file> # find imports
find . -name "*<module>*test*" -o -name "*<module>*spec*"
grep -rl "<component>" docs/changes/ docs/design-docs/ docs/plans/ # docs/changes/*/plans/ covered by docs/changes/; docs/plans/ kept for legacy
grep -rn "interface\|type\|schema" <changed-file> | head -20
git log --oneline -5 -- <affected-file>
Use to determine: Hotspot? (3+ changes in last 5 commits) Recently refactored? Multiple authors? Last change was bugfix? (yellow flag)
Exit: Context bundles assembled for all four domains. Continue to Phase 4.
Tier: mixed | Purpose: Run four parallel review subagents with domain-scoped context. Each produces ReviewFinding[].
Rigor overrides:
fast: All agents at fast tier (haiku-class), reduced reasoning depth.standard: Default tiers per agent.thorough: Default tiers + meta-judge pass (strong tier) cross-validating findings, flagging contradictions, surfacing cross-cutting concerns.Reviews adherence to project conventions, standards, and documentation.
Input: Compliance context bundle (convention files + changed files + change type)
Focus by change type:
Feature:
Bugfix:
Refactor:
Docs:
Output: ReviewFinding[] with domain: 'compliance'
Reviews for logic errors, edge cases, and correctness issues.
Input: Bug detection context bundle (changed files + dependencies)
Focus areas:
Output: ReviewFinding[] with domain: 'bug'
Invokes harness-security-review in changed-files mode as the security fan-out slot.
Input: Security context bundle (security-relevant paths + data flows)
Invocation: Pipeline invokes harness-security-review with scope changed-files:
--deep was passedReviewFinding[] with security fields (cweId, owaspCategory, confidence, remediation, references)Focus areas:
Semantic security review (beyond mechanical scanners):
Stack-adaptive: Node.js (prototype pollution, ReDoS, path traversal), React (XSS, dangerouslySetInnerHTML), Go (race conditions, integer overflow, unsafe pointer), Python (pickle, SSTI, command injection)
Security posture alignment: Check get_security_trends to see if the changes align with or diverge from the project's security posture trajectory.
CWE/OWASP references: All findings include cweId, owaspCategory, remediation.
Confirmed vulnerabilities are always severity: 'critical'.
Dedup with mechanical scan: Phase 5 uses the exclusion set from Phase 2 to discard overlapping findings.
Output: ReviewFinding[] with domain: 'security'
Reviews architectural violations, dependency direction, and design pattern compliance.
Input: Architecture context bundle (layer boundaries + import graph)
Focus areas:
Output: ReviewFinding[] with domain: 'architecture'
Exit: All four agents returned findings. Continue to Phase 5.
Tier: none (mechanical) | Purpose: Remove false positives via cross-referencing.
Steps:
query_graph. Discard findings with invalid reachability claims.detect_stale_constraints to check if architectural constraints are still valid after the reviewed changes. Findings referencing invalidated constraints are downgraded.suggestion.Exit: Validated finding set. Continue to Phase 6.
Tier: none (mechanical) | Purpose: Eliminate redundant findings across agents.
Steps:
file + overlapping lineRange (intersecting or within 3 lines).Exit: Deduplicated, severity-assigned list. Continue to Phase 7.
Tier: none | Purpose: Deliver review in requested format.
**[STRENGTH]** Clean separation between route handler and service logic
**[CRITICAL]** api/routes/users.ts:12-15 -- Direct import from db/queries.ts bypasses service layer
**[IMPORTANT]** services/user-service.ts:45 -- createUser does not handle duplicate email
**[SUGGESTION]** Consider extracting validation into a shared utility
Strengths: What is done well. Be specific -- "Clean separation between X and Y" not "Looks good".
Issues: Grouped by severity (Critical / Important / Suggestion). Each includes: location, problem, rationale, suggested fix.
Assessment: Approve (no critical/important), Request Changes (critical/important present), or Comment (observations only).
Learnings Applied (thorough only): List learnings with Jaccard scores and how they influenced review. Omitted in fast/standard.
Exit code: 0 for Approve/Comment, 1 for Request Changes.
--comment)gh pr review --event APPROVE|REQUEST_CHANGES|COMMENT --body "<summary>"
gh api repos/{owner}/{repo}/pulls/{pr}/comments \
--field body="<rationale>\n\`\`\`suggestion\n<fix>\n\`\`\`" \
--field path="<file>" --field line=<line>
emit_interaction({
path: "<project-root>",
type: "confirmation",
confirmation: {
text: "Review complete: <Assessment>. Accept review?",
context: "<N critical, N important, N suggestion findings>",
impact: "Accepting finalizes findings. Approve = ready for merge. Request-changes = fixes needed.",
risk: "<low if approve, high if critical>"
}
})
Write handoff to the session-scoped path when session slug is known, otherwise fall back to global:
.harness/sessions/<session-slug>/handoff.json.harness/handoff.json[DEPRECATED] Writing to
.harness/handoff.jsonis deprecated. In autopilot sessions, always write to.harness/sessions/<slug>/handoff.json.
{
"fromSkill": "harness-code-review",
"phase": "OUTPUT",
"summary": "<assessment summary>",
"assessment": "approve | request-changes | comment",
"findingCount": { "critical": 0, "important": 0, "suggestion": 0 },
"artifacts": ["<reviewed files>"]
}
Write session summary (if session known):
writeSessionSummary(projectPath, sessionSlug, {
session: "<session-slug>",
lastActive: "<ISO timestamp>",
skill: "harness-code-review",
spec: "<spec path if known>",
status: "Review complete. Assessment: <type>. <N> findings.",
keyContext: "<1-2 sentences: review outcome, key findings>",
nextStep: "<Address findings / Ready to merge / Observations delivered>"
})
If "approve": Emit transition:
{
"type": "transition",
"transition": {
"completedPhase": "review",
"suggestedNext": "merge",
"reason": "Review approved with no blocking issues",
"artifacts": ["<reviewed files>"],
"requiresConfirmation": true,
"summary": "Review approved. <N> suggestions. Ready for PR/merge.",
"qualityGate": {
"checks": [
{ "name": "mechanical-checks", "passed": true },
{ "name": "no-critical-findings", "passed": true },
{ "name": "no-important-findings", "passed": true },
{ "name": "harness-validate", "passed": true }
],
"allPassed": true
}
}
}
If user confirms: proceed to create PR or merge. If user declines: stop.
If "request-changes": Do NOT emit transition. Surface critical/important findings for resolution. Re-run after fixes.
If "comment": Do NOT emit transition. Observations delivered, no further action implied.
Not part of the pipeline. Documents the process for requesting reviews.
Not part of the pipeline. Documents the process for responding to feedback.
harness validate and harness check-deps, commit referencing feedback.Every ReviewFinding.evidence array MUST include citations using one of:
file:line format (e.g., src/api/routes/users.ts:12-15 -- "bypasses service layer")routes/users.ts:3 imports db/queries.ts)AGENTS.md:45)evidence session section via manage_stateWhen to cite: Phase 4 (each subagent populates evidence), Phase 5 (evidence verifies reachability), Phase 7 (every issue backed by evidence).
Uncited claims: Findings without evidence discarded in Phase 5. Observations without file:line references prefixed [UNVERIFIED] and downgraded to suggestion.
Review rubrics passed to subagents in Phase 4 MUST use compressed single-line format to minimize token consumption. Each rubric entry is one line with pipe-delimited fields:
domain|check-name|severity|one-sentence-criterion
Example (Compliance Agent rubric):
compliance|spec-alignment|critical|Implementation matches all behaviors specified in the approved spec
compliance|api-surface|important|New exports are minimal and well-named; internal symbols stay unexported
compliance|backward-compat|critical|No breaking changes to existing callers without documented migration path
compliance|naming|suggestion|Names follow project conventions (check AGENTS.md or .eslintrc)
Why: Verbose rubric prose inflates context by 2-5x without improving review accuracy. Dense single-line rubrics give the agent the same signal in fewer tokens, leaving more budget for actual code analysis.
Rules:
assess_project — Phase 2: run validate/deps/docs in parallel. Failures are Critical and stop pipeline.analyze_diff — Phase 2: structured breakdown of changes by file, category, and risk level.create_self_review — Pre-pipeline: generate a self-review checklist as baseline expectations.compute_blast_radius — Phase 3/5: run on changed files to identify downstream modules that may be affected by the changes.detect_stale_constraints — Phase 5: check if architectural constraints are still valid after the reviewed changes.get_security_trends — Phase 4 (Security Agent): check if changes align with or diverge from the project's security posture trajectory.gather_context — Phase 3: parallel context assembly. Session parameter scopes to session directory.harness cleanup — Optional Phase 2 check for entropy in changed files.emit_interaction — Post-review: suggest merge transition on APPROVE. Confirmed transition.--fast/--thorough control learnings, agent tiers, output. See Rigor Levels table.filterByRelevance — Phase 3 learnings scoring. Threshold 0.7, budget 1000 tokens..harness/sessions/<slug>/ contains handoff.json, state.json, artifacts.json (spec/plan paths, reviewed file list). Write handoff to session scope when slug is known. Global .harness/handoff.json is deprecated for session-aware invocations.When a review subagent encounters ambiguity during analysis, classify it immediately:
suggestion and rationale explaining the ambiguity. Do not guess.Do not suppress ambiguous findings. An ambiguous finding surfaced as a question is more valuable than a confident finding built on a wrong assumption.
--ci)--comment posts inline GitHub comments with committable suggestions--deep adds threat modeling to Security agentfast: learnings skipped, all agents at fast tierthorough: learnings always loaded/scored, meta-judge validates, "Learnings Applied" in outputstandard: learnings included if file exists, scored at 0.7 thresholdPhase 1 (GATE): Skipped -- manual invocation.
Phase 2 (MECHANICAL): harness validate passes. harness check-deps passes. Security scan clean. tsc --noEmit passes. Lint passes.
Phase 3 (CONTEXT): Change type: feature (prefix feat:). Bundles:
CLAUDE.md + changed filesapi/routes/users.ts, services/user-service.ts, db/queries.tsapi/routes/users.ts (endpoint), services/user-service.ts (data flow)routes -> services -> db layersPhase 4 (FAN-OUT): Four agents in parallel:
Phase 5 (VALIDATE): No mechanical exclusions. Architecture finding validated by check-deps showing layer violation.
Phase 6 (DEDUP+MERGE): No overlaps -- 2 distinct findings in different files.
Phase 7 (OUTPUT):
Strengths:
Issues:
Critical:
api/routes/users.ts:12-15 -- Direct import from db/queries.ts bypasses service layer. Must route through services/user-service.ts. (domain: architecture, validatedBy: heuristic)Important:
services/user-service.ts:45 -- createUser does not handle duplicate email. Database throws constraint violation surfacing as 500. Should catch and return 409. (domain: bug, validatedBy: heuristic)Suggestion: (none)
Assessment: Request Changes -- one critical layer violation and one important missing error handler.
--no-mechanical. If not run in CI or locally, they must run in Phase 2 before AI review.harness validate and harness check-deps must pass. Always Critical.// removed, // TODO: re-add, // no longer needed), flag as Critical. Comments are not fixes. The code was either needed (removal is a bug) or not (remove silently). A comment replacing code is technical debt disguised as a change.These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
| Rationalization | Reality |
|---|---|
| "The tests pass, so the logic must be correct" | Tests can be incomplete. Review the logic independently of test results. |
| "This is how it was done elsewhere in the codebase" | Existing patterns can be wrong. Evaluate the pattern on its merits, not just its precedent. |
| "It's just a refactor, low risk" | Refactors change behavior surfaces. Review them with the same rigor as feature changes. |
| "The fix is trivial, I'll just apply it inline" | Trivial fixes still skip review when applied by the reviewer. Suggest the fix; let the author apply and re-review. Iron Law. |
| "The diff is small so I can approve without reading every file" | Small diffs can contain critical bugs. Read every changed file completely — size does not correlate with risk. A one-line auth bypass is a small diff. |
| "The author is experienced, so I can be less thorough" | Review rigor is based on the code, not the author. Experienced authors make mistakes too. Apply the same checklist regardless of who wrote it. |
.harness/review-learnings.md Noise / False Positives section.