English | 한국어
Deep Review Plugin
An independent Evaluator plugin for AI coding agents — cross-model code review with Codex integration and Sprint Contract support.
The Problem
AI coding agents have a fundamental blind spot: they review their own work.
- The agent that wrote the code also judges it — self-approval bias is structural
- Opus writes 500 lines and then "reviews" them in the same context window
- Critical bugs, architectural drift, and entropy accumulate undetected
- Without separation between Generator and Evaluator, "review" is just narration
Generator-Evaluator separation is not optional. It's the only way to get an honest second opinion.
The Solution
Deep Review spawns a separate Opus subagent with no knowledge of the original session context. It sees only the diff — not the reasoning, intentions, or assumptions behind the code. This is a structurally independent evaluation.
When Codex is installed, the review escalates to 3-way parallel verification: Claude Opus + codex:review + codex:adversarial-review. Findings are synthesized by confidence level.
Role in Harness Engineering
deep-review is the independent evaluator in the Deep Suite ecosystem, implementing the Generator-Evaluator separation from the Harness Engineering framework.
In the 2×2 matrix:
- Inferential Sensor: Independent Opus subagent review with zero Generator context — the primary quality gate for semantic issues that computational sensors cannot catch
- 3-Way Cross-Model Verification: Opus + Codex standard + Codex adversarial — exceeds the framework's "LLM-as-judge" concept
- Fitness-Aware Review: Consumes
fitness.json rules and health_report from deep-work for architecture-intent-aware evaluation
- Sprint Contract Verification: Structured success criteria checking
Key Commands
| Command | Description |
|---|
/deep-review | Review current changes with an independent Opus subagent |
/deep-review --contract | Sprint Contract-based structural verification |
/deep-review --entropy | Entropy scan (duplicates, pattern drift, naming mismatches) |
/deep-review init | Initialize per-project review rules interactively |
Review Pipeline
Deep Review runs a 4-stage pipeline on every invocation:
Stage 1: Collect — Detect environment, gather diff
Stage 2: Contract — Load Sprint Contract if present
Stage 3: Deep Review — Spawn Opus subagent in background (+ Codex if available)
Stage 4: Verdict — Synthesize findings, emit APPROVE / CONCERN / REQUEST_CHANGES
Stage 1: Collect
Environment detection script determines the git state and collects the appropriate diff:
non-git — ask user which files to review
initial (zero commits) — review all files against empty tree
clean — git diff {review_base}..HEAD
staged — git diff --cached
unstaged — git diff
mixed — git diff HEAD
untracked-only — git ls-files --others --exclude-standard
Excluded from diff: binaries, vendor/, node_modules/, *.min.js, *.generated.*, *.lock
Stage 2: Contract Check
--contract flag behavior:
--contract SLICE-NNN: Load only .deep-review/contracts/SLICE-NNN.yaml (must be status: active)
--contract: Load all status: active contracts in .deep-review/contracts/
- No flag: If active contracts exist in
.deep-review/contracts/, they are loaded automatically
status: archived contracts are excluded from auto-loading; if explicitly specified, a warning is shown
- YAML parse errors: skip the contract and emit a warning
Each criterion is verified against the actual code changes.
Stage 3: Deep Review
An independent code-reviewer agent is spawned via the Agent tool with model: opus and run_in_background: true. Before spawning, the user is notified which reviewers will run (Opus-only or 3-way). It receives only the diff, rules, and contract — never the originating session context.
The agent evaluates 5 criteria:
| # | Criterion | Checks |
|---|
| 1 | Correctness | Logic bugs, edge cases, error handling |
| 2 | Architecture fit | rules.yaml violations, layer boundaries, dependency direction |
| 3 | Entropy | Duplicate code, pattern drift, ad-hoc helpers |
| 4 | Test coverage | Coverage relative to changes, missing scenarios |
| 5 | Readability | Will the next agent understand this on first read? |
Stage 4: Verdict
| Finding | Verdict |
|---|
| Any 🔴 Critical | REQUEST_CHANGES |
| 🟡 Warnings, all reviewers agree | REQUEST_CHANGES |
| 🟡 Warnings, split opinion | CONCERN |
| All pass | APPROVE |
Report is saved to .deep-review/reports/{YYYY-MM-DD}-review.md.
Cross-Model Verification