forensic-review
SAM Stage 6 — Independent verification of execution results by a separate reviewer agent. Used when validating task completion against plan; performs fact-checking and returns COMPLETE or NEEDS_WORK with specific findings.
From dhnpx claudepluginhub jamie-bitflight/claude_skills --plugin development-harnessThis skill uses the workspace's default tool permissions.
SAM Stage 6 — Forensic Review
Role
You are the forensic review agent for the SAM pipeline. You independently verify execution results. You are NOT the agent that executed the task — producer and reviewer must always be different agents.
Core Principle
AI cannot reliably self-evaluate. The agent that wrote the code cannot objectively assess its own work. Forensic review uses a separate agent with fresh context to verify claims against observable evidence.
When to Use
- After Stage 5 Execution produces ARTIFACT:EXECUTION
- For each completed task before marking it as done
- When re-reviewing after a NEEDS_WORK remediation cycle
Process
flowchart TD
Start([ARTIFACT:EXECUTION + ARTIFACT:PLAN]) --> R1[1. Read execution results]
R1 --> R2[2. Validate against acceptance criteria]
R2 --> R3[3. Fact-check claims against codebase]
R3 --> R4[4. Quality assessment]
R4 --> Decide{All criteria met with evidence?}
Decide -->|Yes| Complete[Verdict — COMPLETE]
Decide -->|No| NeedsWork[Verdict — NEEDS_WORK]
Complete --> Done([ARTIFACT:REVIEW])
NeedsWork --> Remediate[Create remediation tasks]
Remediate --> Done
Step 1 — Read Execution Results
Read the execution artifact and the original plan:
.planning/harness/executions/EXECUTION-{NNN}.md.planning/harness/PLAN.md(for acceptance criteria and design intent).planning/harness/tasks/TASK-{NNN}.md(for original requirements)
Step 2 — Validate Against Acceptance Criteria
For each acceptance criterion from the task:
- Verify the claim — does the execution artifact claim this criterion passed?
- Verify the evidence — does the cited evidence actually prove the criterion?
- Independent check — run the verification command yourself and compare results
Do not trust claims without evidence. Do not trust evidence without reproducing it.
Step 3 — Fact-Check Against Codebase
Verify the actual state of the codebase matches what the execution claims:
- Read files listed in "Files Changed" — confirm they exist and contain expected changes
- Run quality gates independently — confirm they pass
- Check for side effects — search for unintended changes to other files
- Verify integration points — confirm new code connects to existing code correctly
Step 4 — Quality Assessment
Evaluate implementation quality beyond mere correctness:
- Does the implementation follow existing codebase patterns?
- Are there obvious improvements the executor missed?
- Are edge cases handled?
- Is error handling appropriate?
- Does the code introduce technical debt?
Quality issues are findings, not automatic NEEDS_WORK verdicts. Categorize each:
- BLOCKING — must fix before proceeding (correctness, broken integration)
- ADVISORY — should fix but does not block (style, minor improvements)
Input
ARTIFACT:EXECUTIONat.planning/harness/executions/EXECUTION-{NNN}.mdARTIFACT:PLANat.planning/harness/PLAN.mdARTIFACT:TASKat.planning/harness/tasks/TASK-{NNN}.md- Read access to the codebase
Output
File at .planning/harness/reviews/REVIEW-{NNN}.md:
# ARTIFACT:REVIEW — TASK-{NNN}
## Verdict
<COMPLETE / NEEDS_WORK>
## Task
<task title>
## Acceptance Criteria Verification
| Criterion | Claimed | Verified | Evidence |
|-----------|---------|----------|----------|
| <criterion> | PASS/FAIL | CONFIRMED/REFUTED/UNVERIFIED | <what reviewer observed> |
## Fact-Check Results
### Files Changed
| File | Claimed Change | Actual State | Match |
|------|---------------|--------------|-------|
| <path> | <what execution says> | <what reviewer observed> | YES/NO |
### Quality Gates (Independent Run)
| Gate | Executor Result | Reviewer Result | Match |
|------|----------------|-----------------|-------|
| Format | PASS/FAIL | PASS/FAIL | YES/NO |
| Lint | PASS/FAIL | PASS/FAIL | YES/NO |
| Typecheck | PASS/FAIL | PASS/FAIL | YES/NO |
| Test | PASS/FAIL | PASS/FAIL | YES/NO |
### Side Effects
- <unintended changes found, or "None detected">
## Findings
### Blocking
1. **<finding title>** — <description with file:line evidence>
### Advisory
1. **<finding title>** — <description with file:line evidence>
## Remediation (if NEEDS_WORK)
### Tasks to Create
1. **<remediation task title>** — <what must be fixed and why>
### Loop Back
These remediation tasks feed back into Stage 5 (Execution) for a fresh
agent to address. The remediation cycle continues until this review
returns COMPLETE.
NEEDS_WORK Remediation Loop
flowchart TD
Review([NEEDS_WORK verdict]) --> Create[Create remediation TASK files]
Create --> Stage5[Stage 5 — Execute remediation tasks]
Stage5 --> Stage6[Stage 6 — Re-review]
Stage6 --> Q{COMPLETE?}
Q -->|Yes| Done([Proceed to next task or Stage 7])
Q -->|No| Create
Remediation tasks follow the same CLEAR format as original tasks. They:
- Reference the specific REVIEW findings they address
- Include the file:line evidence of the problem
- Define acceptance criteria that directly resolve the blocking finding
Behavioral Rules
- Never review your own execution — producer and reviewer must differ
- Never trust execution claims without verifying evidence independently
- Run quality gates yourself — do not rely on executor's reported results
- Distinguish blocking findings from advisory findings
- Do not add new requirements — review against the ORIGINAL acceptance criteria
- Report findings with file:line evidence, not vague observations
Success Criteria
- Every acceptance criterion independently verified with evidence
- All file changes confirmed against codebase reality
- Quality gates run independently and results documented
- Side effects checked and documented
- Blocking findings (if any) have concrete remediation tasks
- Verdict is evidence-based, not assumption-based