From harness-claude
> Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off.
Validates plan and task quality post-/speckit-plan and /speckit-tasks: coverage matrix, red flag scanning, task standards enforcement, NFR validation, REVIEWERS.md generation.
Orchestrates multi-agent adversarial review of specs by spawning red team, assumptions auditor, testability auditor, and design contract checker. Requires high intensity; use after /cspec or /cmodel.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off.
--mode spec)--mode plan)--mode spec — Run spec-mode checks (S1-S7). Invoked by harness-brainstorming.--mode plan — Run plan-mode checks (P1-P7). Invoked by harness-planning.No spec or plan may be signed off without a converged soundness review. Inferrable fixes are applied silently. Design decisions are always surfaced to the user.
Every finding conforms to this structure:
{
"id": "string — unique identifier",
"check": "string — e.g. S1, P3",
"title": "string — one-line summary",
"detail": "string — explanation with evidence",
"severity": "error | warning — errors block sign-off",
"autoFixable": "boolean — whether fixable without user input",
"suggestedFix": "string | undefined — what the fix would do",
"evidence": ["string[] — references to spec/plan sections and codebase files"]
}
Execute all checks for the active mode. Classify each finding as autoFixable: true or false. Record total issue count.
Run check_traceability to verify that all requirements in the spec/plan have corresponding implementation artifacts. Run validate_cross_check to verify plan-to-implementation alignment as part of the soundness assessment.
Before running checks, determine graph availability:
.harness/graph/ exists.query_graph — traverse module/dependency nodes to verify referenced patterns and architectural compatibilityfind_context_for — search for related design decisions from other specsget_relationships — verify dependency direction and layer complianceget_impact — analyze downstream impact to verify dependency completenessPer-check procedures include "Without graph" and "With graph" variants. Use whichever matches step 1.
--mode spec)| # | Check | What it detects | Auto-fixable? |
|---|---|---|---|
| S1 | Internal coherence | Contradictions between decisions, technical design, and success criteria | No — surface to user |
| S2 | Goal-criteria traceability | Goals without success criteria; orphan criteria not tied to any goal | Yes — add missing links, flag orphans |
| S3 | Unstated assumptions | Implicit assumptions not called out (e.g., single-tenant, always-online) | Partially — infer obvious ones, surface ambiguous |
| S4 | Requirement completeness | Missing error/edge cases, failure modes; EARS unwanted-behavior gaps | Partially — add obvious error cases, surface design-dependent |
| S5 | Feasibility red flags | Design depends on nonexistent codebase capabilities or incompatible patterns | No — surface with evidence |
| S6 | YAGNI re-scan | Speculative features that crept in during conversation | No — surface to user |
| S7 | Testability | Vague success criteria not observable or measurable ("should be fast") | Yes — add thresholds where inferrable |
Analyze: Decisions table, Technical Design, Success Criteria, Non-goals.
Detection:
Classification: Always severity: "error", autoFixable: false. Contradictions require user judgment.
Example:
{
"id": "S1-001",
"check": "S1",
"title": "Decision contradicts Technical Design",
"detail": "D3 says 'use SQLite' but Technical Design > Data Layer describes PostgreSQL with migrations.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Align Technical Design with decision (SQLite) or update decision to PostgreSQL.",
"evidence": ["Decisions D3: 'Use SQLite'", "Technical Design > Data Layer: 'PostgreSQL schema'"]
}
Analyze: Overview (goals), Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix: add criterion derived from Technical Design.severity: "warning", autoFixable: false. Removing criteria is a design decision.Example:
{
"id": "S2-001",
"check": "S2",
"title": "Goal has no success criterion",
"detail": "Goal 'Support offline mode' has no corresponding criterion.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'App functions without network for all read operations, returning cached data.'",
"evidence": ["Overview: 'Support offline mode'", "Success Criteria: no match"]
}
Analyze: Technical Design, Decisions table, data structures, integration points.
Detection:
query_graph for related modules' assumptions. Use find_context_for to surface conflicting design decisions.Classification:
severity: "warning", autoFixable: true. Fix: add to Assumptions section.severity: "warning", autoFixable: false. User decides.Example:
{
"id": "S3-001",
"check": "S3",
"title": "Implicit Node.js runtime assumption",
"detail": "Technical Design references 'path.join' and 'fs.readFileSync' without declaring Node.js runtime.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add to Assumptions: 'Runtime: Node.js >= 18.x (LTS).'",
"evidence": [
"Technical Design > File Operations: path.join, fs.readFileSync",
"No Assumptions section"
]
}
{
"id": "S3-002",
"check": "S3",
"title": "Ambiguous concurrency model",
"detail": "Technical Design describes a background job processor but does not specify in-process, worker thread, or separate process. Affects error isolation and deployment.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Add decision specifying concurrency model: in-process event loop, worker_threads, or separate process.",
"evidence": [
"Technical Design > Job Processor: 'processes background jobs'",
"Decisions table: no concurrency entry"
]
}
Analyze: Technical Design (data structures, API endpoints, integration points), Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix follows codebase patterns.severity: "warning", autoFixable: false.Example:
{
"id": "S4-001",
"check": "S4",
"title": "Missing file-not-found error case",
"detail": "Config read with fs.readFileSync has no ENOENT handling. Codebase convention (packages/core/src/config.ts) returns defaults.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'If config file missing (ENOENT), return default config. Log debug message.'",
"evidence": [
"Technical Design: 'read config from harness.config.json'",
"Codebase: config.ts returns defaults on ENOENT"
]
}
{
"id": "S4-002",
"check": "S4",
"title": "Undefined retry strategy for external service",
"detail": "Technical Design calls an external API for license validation but specifies no timeout, unavailability, or error behavior. Design decision affects UX (block vs degrade).",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Add decision: 'When license API unavailable: (a) fail open with warning, (b) fail closed, or (c) cache last result for N hours.'",
"evidence": [
"Technical Design > License Check: 'call /api/validate on startup'",
"No fallback behavior specified"
]
}
Analyze: Technical Design (referenced modules, dependencies, patterns, APIs).
Detection:
query_graph to verify modules exist and check dependencies. Use get_relationships for architectural compatibility. Use get_impact for cascading effects not in spec.Classification: Always severity: "error", autoFixable: false. Feasibility problems require design revision.
Example:
{
"id": "S5-001",
"check": "S5",
"title": "Referenced function has different signature",
"detail": "Spec says 'validateDependencies(projectPath)' but actual signature is 'validateDependencies(config: ProjectConfig): ValidationResult'.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Update Technical Design to use actual signature with ProjectConfig parameter.",
"evidence": [
"Technical Design: 'call validateDependencies(projectPath)'",
"packages/core/src/validator.ts:42: actual signature"
]
}
Analyze: Technical Design, Decisions table, Implementation Order.
Detection:
Classification: Always severity: "warning", autoFixable: false. Removing features is a design decision.
Example:
{
"id": "S6-001",
"check": "S6",
"title": "Speculative configuration option",
"detail": "'pluginDir' config option defined but no goal/criterion mentions plugins.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Remove pluginDir and plugin loading from Technical Design.",
"evidence": ["Technical Design: 'pluginDir: string'", "Overview/Criteria: no plugin mention"]
}
Analyze: Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix: replace vague qualifier with specific threshold.severity: "error", autoFixable: false. User must rewrite.Example:
{
"id": "S7-001",
"check": "S7",
"title": "Vague performance criterion",
"detail": "Criterion #3 says 'build should be fast'. Technical Design mentions 30-second CI timeout.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Replace with 'build completes in under 30 seconds on CI'.",
"evidence": ["Criteria #3: 'build should be fast'", "Technical Design > CI: '30-second timeout'"]
}
--mode plan)| # | Check | What it detects | Auto-fixable? |
|---|---|---|---|
| P1 | Spec-plan coverage | Success criteria with no corresponding task(s) | Yes — add missing tasks |
| P2 | Task completeness | Tasks missing inputs, outputs, or verification | Yes — infer and fill in |
| P3 | Dependency correctness | Cycles in dependency graph; undeclared dependencies | Yes — add missing edges |
| P4 | Ordering sanity | Same-file tasks in parallel; consumers before producers | Yes — reorder |
| P5 | Risk coverage | Spec risks without mitigation in plan | Partially — add obvious, surface others |
| P6 | Scope drift | Plan tasks not traceable to any spec requirement | No — surface to user |
| P7 | Task-level feasibility | Undecided dependencies; tasks too vague to execute | No — surface to user |
Analyze: Spec's Success Criteria and plan's Tasks. Requires both documents.
Detection:
Classification: Always severity: "error", autoFixable: true. Fix: add task covering the criterion.
Example:
{
"id": "P1-001",
"check": "P1",
"title": "Spec criterion not covered by any plan task",
"detail": "Criterion #4 ('structured error responses with request-id') has no plan task.",
"severity": "error",
"autoFixable": true,
"suggestedFix": "Add task implementing structured error responses with request-id headers.",
"evidence": ["Spec Criteria #4", "Plan Tasks 1-8: no task references error format"]
}
Analyze: Each task in the Tasks section.
Detection: Verify each task has: (a) clear inputs, (b) clear outputs, (c) verification criterion. Flag tasks missing any element.
Classification: Always severity: "warning", autoFixable: true. Fix: infer the missing element from context (e.g., if a task says "create src/foo.ts" but has no verification, add "Run: npx vitest run src/foo.test.ts" if a test file exists, or "Run: tsc --noEmit" as minimal verification).
Example:
{
"id": "P2-001",
"check": "P2",
"title": "Task missing verification criterion",
"detail": "Task 3 has inputs and outputs but no verification step.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'Run: npx vitest run src/services/notification-service.test.ts'",
"evidence": ["Task 3: no 'Run:' or 'Verify:' step", "Task 4 creates the test file"]
}
Analyze: "Depends on" declarations across all tasks, file paths/artifacts each task references.
Detection:
get_impact on output files to verify downstream consumers are declared as dependents.Classification:
severity: "error", autoFixable: false. Requires task restructuring.severity: "warning", autoFixable: true. Fix: add "Depends on" declaration.Example:
{
"id": "P3-002",
"check": "P3",
"title": "Missing dependency edge",
"detail": "Task 5 imports src/types/notification.ts (created by Task 1) but does not declare dependency.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add 'Depends on: Task 1' to Task 5.",
"evidence": [
"Task 5: imports notification.ts",
"File Map: created by Task 1",
"Task 5 Depends on: Task 4 only"
]
}
{
"id": "P3-001",
"check": "P3",
"title": "Dependency cycle detected",
"detail": "Tasks form a cycle: Task 3 -> Task 5 -> Task 3. Topological sort fails.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Break cycle by merging Tasks 3 and 5, or extract shared dependency into a new task.",
"evidence": [
"Task 3: 'Depends on: Task 5'",
"Task 5: 'Depends on: Task 3'",
"Topological sort failed"
]
}
Analyze: Task execution order, file paths each task touches, parallel opportunities.
Detection:
Classification: Always severity: "warning", autoFixable: true. Fix: reorder tasks or add dependency edges.
Example:
{
"id": "P4-001",
"check": "P4",
"title": "Consumer scheduled before producer",
"detail": "Task 2 imports from src/types/user.ts created by Task 4, with no dependency declared.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add 'Depends on: Task 4' to Task 2, or reorder type definition before Task 2.",
"evidence": ["Task 2: imports user.ts", "Task 4: creates user.ts", "Task 2 Depends on: none"]
}
Analyze: Spec's risk-related content and plan's tasks/checkpoints.
Detection: Identify risks in: explicit "Risks" sections, decision rationale mentioning tradeoffs, success criteria implying failure modes, non-goals with adjacent risk. For each, check plan for: (a) mitigation task, (b) acknowledging checkpoint, or (c) explicit "accepted risk" note. Flag uncovered risks.
Classification:
severity: "warning", autoFixable: true. Fix: add mitigation task.severity: "warning", autoFixable: false. Surface with options.Example:
{
"id": "P5-001",
"check": "P5",
"title": "Spec risk has no mitigation in plan",
"detail": "Risk 'convergence loop may not terminate' has no plan task testing termination.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add task testing convergence termination with fixed-point inputs.",
"evidence": ["Spec Risks: 'loop may not terminate'", "Plan Tasks 1-8: no termination test"]
}
{
"id": "P5-002",
"check": "P5",
"title": "Risk requires design judgment to mitigate",
"detail": "Spec notes 'auto-fix may introduce new issues'. Mitigation depends on design choice: (a) rollback mechanism, (b) single-pass limit, or (c) human approval for cascading fixes.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Choose strategy: (a) rollback — add undo capability, (b) single-pass — simpler but less thorough, (c) human gate — safer but slower.",
"evidence": [
"Spec Risks: 'Auto-fixes may introduce new issues'",
"Decisions: no mitigation strategy"
]
}
Analyze: Plan tasks vs spec goals, success criteria, and technical design.
Detection: For each plan task, check traceability: (a) directly implements a criterion, (b) necessary prerequisite, or (c) infrastructure called for in spec. Flag untraceable tasks.
Classification: Always severity: "warning", autoFixable: false. User confirms whether each flagged task is in scope.
Example:
{
"id": "P6-001",
"check": "P6",
"title": "Plan task not traceable to spec requirement",
"detail": "Task 8 ('Add Redis caching layer') not traceable to any spec goal or criterion.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Remove Task 8, or add corresponding goal/criterion to spec.",
"evidence": ["Task 8: 'Redis caching'", "Spec: no mention of caching"]
}
Analyze: Each task's description, file paths, code snippets, referenced decisions.
Detection:
Classification: Always severity: "error", autoFixable: false. Requires planner revision.
Example:
{
"id": "P7-001",
"check": "P7",
"title": "Task depends on undecided design choice",
"detail": "Task 7 says 'implement caching layer' but Decisions table has no caching strategy entry.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Make caching decision in spec (e.g., 'D5: LRU with 5-min TTL'), then update Task 7.",
"evidence": ["Task 7: 'Implement caching layer'", "Decisions: no caching entry"]
}
{
"id": "P7-002",
"check": "P7",
"title": "Task too vague to execute in one context window",
"detail": "Task 4 says 'implement the notification service' without specifying methods, signatures, or error handling. Cannot complete without making design decisions.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Split into sub-tasks: (a) NotificationService.create() with signature/errors, (b) NotificationService.list() with filtering, (c) NotificationService.markRead() with idempotency.",
"evidence": [
"Task 4: 'Implement the notification service'",
"No signatures, no error spec",
"Iron law: every task completable in one context window"
]
}
For every finding where autoFixable: true:
For autoFixable: false: skip. They surface in Phase 4.
| Check | Auto-fixable findings | Fix behavior |
|---|---|---|
| S1 | None | Always surfaced |
| S2 | Missing traceability links | Silent fix |
| S2 | Orphan criteria | Surfaced — design decision |
| S3 | Obvious assumptions (runtime, encoding) | Silent fix |
| S3 | Ambiguous assumptions (concurrency, tenancy) | Surfaced — user chooses |
| S4 | Obvious error cases (file I/O, JSON, network) | Silent fix |
| S4 | Design-dependent error handling | Surfaced — user chooses strategy |
| S5 | None | Always surfaced |
| S6 | None | Always surfaced |
| S7 | Vague criteria with inferrable thresholds | Silent fix |
| S7 | Unmeasurable criteria | Surfaced — user rewrites |
| P1 | Missing task for uncovered criterion | Silent fix |
| P2 | Missing inputs, outputs, or verification | Silent fix |
| P3 | Missing dependency edges | Silent fix |
| P3 | Dependency cycles | Surfaced — design decision |
| P4 | File conflicts or consumer-before-producer | Silent fix |
| P5 | Obvious risk mitigation | Silent fix |
| P5 | Judgment-dependent mitigation | Surfaced — user chooses |
| P6 | None | Always surfaced |
| P7 | None | Always surfaced |
Rule: A fix is silent when the correct resolution requires no design judgment. If two or more plausible resolutions exist, surface it.
When: A goal has no corresponding success criterion.
Fix log example:
[S2-001] FIXED: Added criterion #11 for 'Support offline mode':
'App functions without network for all read operations, returning cached data.'
Derived from: Technical Design > Offline Cache.
When: Criterion uses vague qualifiers and Technical Design provides a threshold.
Fix log example:
[S7-001] FIXED: Replaced criterion #3 'build should be fast' with:
'Build completes in under 30 seconds on CI (per Technical Design > CI Config).'
When: Technical Design implies assumptions not documented in spec.
fs.readFileSync implies Node.js).Fix log example:
[S3-001] FIXED: Added assumption: 'Runtime: Node.js >= 18.x (LTS).'
Evidence: Technical Design references path.join, fs.readFileSync.
When: An operation has no error behavior and codebase has established pattern.
Fix log example:
[S4-001] FIXED: Added ENOENT error case for config read:
'If config missing, return defaults. Log debug message.'
Following: packages/core/src/config.ts pattern.
When: A spec criterion has no corresponding plan task.
Fix log example:
[P1-001] FIXED: Added Task 9 for criterion #5 (error logging):
'Create src/utils/error-logger.ts. Verify: npx vitest run error-logger.test.ts'
When: Task missing inputs, outputs, or verification.
Fix log example:
[P2-001] FIXED: Added verification to Task 3:
'Run: npx vitest run src/services/notification-service.test.ts'
When: Task B uses artifact from Task A without declaring dependency.
Fix log example:
[P3-001] FIXED: Added 'Depends on: Task 2' to Task 5.
Task 5 imports src/types/notification.ts created by Task 2.
When: Two tasks touch same file without sequencing, or consumer before producer.
Fix log example:
[P4-001] FIXED: Added 'Depends on: Task 4' to Task 2.
Both modify src/routes/index.ts. Sequencing prevents conflicts.
When: Spec risk has no plan coverage and mitigation is straightforward.
Fix log example:
[P5-001] FIXED: Added Task 10 for convergence termination testing.
Mitigates: 'convergence loop may not terminate'.
Every auto-fix MUST be logged:
[{finding-id}] FIXED: {one-line description}
{new text added/modified}
{source/evidence}
The fix log lets users review silent changes and trace causes if fixes introduce new issues.
After Phase 2 auto-fixes, the convergence loop determines whether further progress is possible.
count_previous.count_current.count_current < count_previous: progress made. Go to Phase 2, apply new auto-fixes, return here.count_current == count_previous: no progress. Remaining issues need user input. Proceed to Phase 4.count_current > count_previous: fixes introduced new issues. Log warning, proceed to Phase 4.A fix in one pass can make a previously non-auto-fixable finding become auto-fixable. Examples:
Spec-mode cascades:
Plan-mode cascades:
Cascading fixes are why the loop re-runs ALL checks, not just those that produced auto-fixable findings.
Pass 1 (initial):
S1: 0 | S2: 1 (auto-fix) | S3: 2 (1 auto-fix, 1 user) | S4: 1 (auto-fix)
S5: 0 | S6: 0 | S7: 1 (auto-fix)
Total: 5 (4 auto-fixable, 1 user). count_previous = 5
Phase 2: Apply 4 fixes.
[S2-001] Added criterion #11 for 'offline mode'.
[S3-001] Added Node.js runtime assumption.
[S4-001] Added ENOENT error case.
[S7-001] Replaced 'fast' with 'under 30 seconds on CI'.
Pass 2:
S2: 0 | S3: 1 CASCADING (UTF-8 assumption now appendable) + 1 user unchanged
S4: 0 | S7: 0
Total: 2 (1 auto-fixable, 1 user). count_current=2 < 5. Continue.
Phase 2: Apply 1 fix. [S3-003] Added UTF-8 assumption.
Pass 3: Total: 1 (0 auto-fixable, 1 user). count_current=1 < 2. Continue.
Phase 2: 0 fixes.
Pass 4: Total: 1. count_current=1 = count_previous=1. Converged.
→ Phase 4 with 1 remaining issue.
Pass 1 (initial):
P1: 1 (auto-fix) | P2: 1 (auto-fix) | P3: 0 | P4: 0
P5: 1 (user) | P6: 0 | P7: 1 (user)
Total: 4 (2 auto-fixable, 2 user). count_previous = 4
Phase 2: Apply 2 fixes.
[P1-001] Added Task 9 for criterion #6 (error logging).
[P2-001] Added verification to Task 4.
Pass 2:
P1: 0 | P2: 0 | P3: 1 CASCADING (Task 6 needs 'Depends on: Task 9')
P5: 1 user | P7: 1 user
Total: 3 (1 auto-fixable, 2 user). count_current=3 < 4. Continue.
Phase 2: [P3-001] Added 'Depends on: Task 9' to Task 6.
Pass 3: Total: 2 (0 auto-fixable). count_current=2 < 3. Continue.
Phase 2: 0 fixes.
Pass 4: Total: 2. count_current=2 = count_previous=2. Converged.
→ Phase 4 with 2 remaining issues.
The loop terminates because:
When findings remain after convergence, present them. If no needs-user-input findings remain, skip to Clean Exit.
error findings before warning findings. Errors block sign-off.N remaining issues need your input (X errors, Y warnings).For each finding, present three sections:
What is wrong:
[{id}] {title} ({severity})
{detail}
Evidence: {evidence[0]}, {evidence[1]}, ...
Why it matters:
error: "Blocks sign-off. Must be resolved."warning: "Advisory. May dismiss with reason (logged)."Suggested resolution:
Accepted responses:
resolved.[{id}] DISMISSED: {reason}. Not re-surfaced.Error findings cannot be dismissed.
Surfaced findings: N total
Resolved: X | Dismissed: Y | Pending: Z
Update after each response. When all addressed, proceed to Step 5.
All of the following must be true:
error findings pending or dismissed.On clean exit:
CLEAN EXIT — all checks pass. Returning control to {parent skill} for sign-off.Note: {N} warnings dismissed. See log.| Check | Without graph | With graph |
|---|---|---|
| S5 | Grep/glob for referenced patterns | query_graph + get_relationships for dependency/architecture verification |
| S3 | Infer from codebase conventions | find_context_for for related design decisions |
| P1 | Text matching criteria to tasks | Graph traceability edges |
| P3 | Static analysis of task descriptions | get_impact for dependency completeness |
| P4 | Parse file paths, detect conflicts | Graph file ownership for accurate conflict detection |
All checks work from document analysis and codebase reads alone. Graph adds precision but is never required.
harness validate — Run by parent skill before/after soundness review. This skill does not invoke validate directly.--mode spec; harness-planning invokes --mode plan..harness/graph/ exists, use query_graph and get_impact for enhanced checks. Fall back to file-based reads otherwise.SoundnessFinding schema is defined in SKILL.mdharness validate passes after all files are written| Flag | Corrective Action |
|---|---|
| "The spec looks internally consistent at a high level" | STOP. S1 requires checking each decision against Technical Design line by line. "High level" consistency misses contradictions in the details. |
| "This assumption is obvious and doesn't need to be stated" | STOP. S3 exists because unstated assumptions cause the most damage when wrong. If it's obvious, writing it down costs nothing. Skipping it costs debugging time later. |
| "The finding is minor so I'll auto-fix it without surfacing to the user" | STOP. Only inferrable fixes are auto-fixed. If the fix involves a design choice — even one you think is obvious — surface it. You are not the designer. |
// TODO: add traceability or // spec gap — fill later in spec/plan files | STOP. TODOs in specs are unfinished review. The spec is not converged. Fix the gap or surface it as a finding — do not defer it. |
Review-never-fixes: Soundness review identifies structural issues in specs and plans. It applies inferrable fixes (formatting, missing links, obvious gaps) but NEVER makes design decisions. If a finding requires judgment, surface it to the user — even if the "right" answer seems obvious. A reviewer who makes design decisions has stopped reviewing and started designing without the authority to do so.
When a check produces ambiguous results, classify the ambiguity immediately:
autoFixable: false.Do not auto-fix ambiguous findings. Ambiguity means you lack context — applying a "fix" without context is guessing.
Soundness check rubrics used internally MUST use compressed single-line format. Each check is one line with pipe-delimited fields:
mode|check-id|severity|criterion
Example (Spec Mode rubric):
spec|S1|error|No contradictions between decisions, technical design, and success criteria
spec|S2|warning|Every goal has at least one success criterion; no orphan criteria
spec|S3|warning|All implicit assumptions documented in Assumptions section
spec|S4|warning|Error/edge cases covered; EARS unwanted-behavior gaps filled
spec|S5|error|No references to nonexistent codebase capabilities or incompatible patterns
spec|S6|error|No speculative features without requirement traceability
spec|S7|warning|All success criteria are observable and measurable with concrete thresholds
Why: Verbose check descriptions inflate review context without improving check accuracy. Dense single-line rubrics give the same signal in fewer tokens, leaving more budget for actual document analysis.
Rules:
spec or planerror or warning| Rationalization | Reality |
|---|---|
| "The spec looks coherent to me, so I can skip running the S1 internal coherence check" | Every check in the mode must run. S1 detects contradictions that human review frequently misses. |
| "This unstated assumption is obvious, so documenting it would be pedantic" | S3 exists because "obvious" assumptions cause the most damage when wrong. Cheapest to document, most expensive to miss. |
| "The success criterion is somewhat vague but the team will know what it means" | S7 flags vague criteria like "should be fast" because they are untestable. Vague criteria survive brainstorming only to fail at verification. |
| "This auto-fixable finding is minor, so I will just note it rather than applying the fix" | Auto-fixable findings should be applied silently — that is the design intent. Skipping them ships known inferrable gaps. |
| "The feasibility check found a signature mismatch but the code can probably be adapted during execution" | S5 red flags are always severity "error" and always surfaced. A spec referencing nonexistent modules produces a broken plan. |
| "The convergence loop is taking too long, so I will skip the re-check and declare converged" | Convergence requires the issue count to stop decreasing. Declaring convergence without a re-check is falsifying the exit condition. |
| "This spec is well-written enough that a soundness review would not find anything" | Every spec gets a soundness review. Well-written specs still have unstated assumptions (S3) and vague criteria (S7). The review is not optional. |
Context: harness-brainstorming has drafted a spec and is about to sign off.
Invoking harness-soundness-review --mode spec...
Phase 1: CHECK
S1 (internal coherence)... 0 findings
S2 (goal-criteria traceability)... 1 finding (auto-fixable)
S3 (unstated assumptions)... 2 findings (2 need user input)
S4 (requirement completeness)... 1 finding (auto-fixable)
S5 (feasibility red flags)... 0 findings
S6 (YAGNI re-scan)... 0 findings
S7 (testability)... 1 finding (auto-fixable)
5 findings total: 3 auto-fixable, 2 need user input.
Phase 2: FIX
[S2-001] FIXED: Added success criterion for 'Support offline mode' goal.
[S4-001] FIXED: Added ENOENT error case for config file read.
[S7-001] FIXED: Replaced 'build should be fast' with 'completes in under 30 seconds on CI'.
3 auto-fixes applied.
Phase 3: CONVERGE
Re-running checks...
S3-001 now auto-fixable (S4-001 created Assumptions section).
[S3-001] FIXED: Added Node.js runtime assumption.
1 additional fix. Re-checking...
Issue count: 1 (was 2). Decreased — continuing.
Re-checking... Issue count: 1 (unchanged). Converged.
Phase 4: SURFACE
1 remaining issue:
[S3-002] Ambiguous concurrency model (warning)
Technical Design describes background job processor without specifying
in-process, worker thread, or separate process.
→ Add decision to Decisions table.
User resolves → adds decision: "in-process event loop"
Re-running checks... 0 findings.
CLEAN EXIT — returning control to harness-brainstorming for sign-off.
Context: harness-planning has drafted a plan and is about to sign off.
Invoking harness-soundness-review --mode plan...
Phase 1: CHECK
P1 (spec-plan coverage)... 1 finding (auto-fixable)
P2 (task completeness)... 2 findings (auto-fixable)
P3 (dependency correctness)... 1 finding (auto-fixable)
P4 (ordering sanity)... 0 findings
P5 (risk coverage)... 1 finding (needs user input)
P6 (scope drift)... 0 findings
P7 (task-level feasibility)... 1 finding (needs user input)
6 findings total: 4 auto-fixable, 2 need user input.
Phase 2: FIX
[P1-001] FIXED: Added Task 9 covering criterion #5 (error logging).
[P2-001] FIXED: Added verification step to Task 3.
[P2-002] FIXED: Added outputs to Task 6.
[P3-001] FIXED: Added 'Depends on: Task 2' to Task 5.
4 auto-fixes applied.
Phase 3: CONVERGE
Re-checking... Issue count: 2 (was 6). Decreased — continuing.
Re-checking... Issue count: 2 (unchanged). Converged.
Phase 4: SURFACE
2 remaining issues:
[P5-001] Spec risk 'performance vs correctness' has no mitigation (warning)
→ Add performance benchmark task, relax validation, or accept risk.
[P7-001] Task 7 depends on undecided caching strategy (error)
→ Make caching decision in spec, then update Task 7.
User resolves P5-001 → adds Task 10 for performance benchmark.
User resolves P7-001 → adds LRU cache decision, updates Task 7.
Re-running checks... 0 findings.
CLEAN EXIT — returning control to harness-planning for sign-off.
These are hard stops. Violating any gate means the process has broken down.