Search everything...

Slash Command

/review

Review current phase implementation against spec, standards, and quality checklist

Install

Run in your terminal

npx claudepluginhub bee-coded/bee-dev --plugin bee

Details

Argument[--phase N] [--loop]

Command Content

Other plugins with /review

/review

Runs Codex code review on local git state (working tree or vs base branch). Supports --wait/--background, --base <ref>, --scope auto|working-tree|branch.

ReadGlobGrep+3

codex

11.4k

/review

commands/

Reviews staged changes or recent commits across five axes—correctness, readability, architecture, security, performance—producing categorized findings with file:line references and fixes.

agent-skills

3.8k

/review

Reviews HTML file for design anti-patterns, principles violations, and accessibility issues. Generates markdown report with status tables and recommendations.

ReadGlobGrep+2

frontend-design-pro

2.7k

/review

Reviews specified code scope via four specialists (quality, security, performance, architecture), producing summary, detailed findings, refactoring suggestions, prioritized action plan.

essentials

2.4k

/review

commands/

Performs expert multi-LLM code review with inline PR comments on staged changes, open PRs, working tree, or paths. Checks LLM providers and queries focus areas.

octo

2.4k

/review

Dispatches the reviewer agent to review current branch code changes against code quality principles.

frontend-fundamentals

1.9k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 19, 2026

Actions

View Source View Plugin View on GitHub View README

/review | bee | ClaudePluginHub

Slash Command

/review

Review current phase implementation against spec, standards, and quality checklist

From bee

Install

Run in your terminal

npx claudepluginhub bee-coded/bee-dev --plugin bee

Details

Argument[--phase N] [--loop]

Command Content

Current State (load before proceeding)

Read these files using the Read tool:

.bee/STATE.md — if not found: NOT_INITIALIZED
.bee/config.json — if not found: use {}

Instructions

You are running /bee:review -- the code review pipeline for BeeDev. This command orchestrates a four-step pipeline: review code, validate findings, fix confirmed issues, and optionally re-review. Follow these steps in order.

Step 1: Validation Guards

Check these guards in order. Stop immediately if any fails:

NOT_INITIALIZED guard: If the dynamic context above contains "NOT_INITIALIZED" (meaning .bee/STATE.md does not exist), tell the user: "BeeDev is not initialized. Run /bee:init first." Do NOT proceed.
NO_SPEC guard: Read STATE.md from the dynamic context above. If no Current Spec Path exists or it shows "(none)", tell the user: "No spec found. Run /bee:new-spec first." Do NOT proceed.
Phase detection: Check $ARGUMENTS for a --phase N flag. If present, use phase N explicitly. Validate: if phase N does not exist in the Phases table, tell the user: "Phase {N} does not exist. Your spec has {M} phases." Do NOT proceed. If the explicit phase's Status is not "EXECUTED" or "REVIEWED", tell the user: "Phase {N} has status {status} -- expected EXECUTED or REVIEWED for review." Do NOT proceed. If --phase N is not present, read the Phases table from STATE.md. Find the last phase where: Status is "EXECUTED" or "REVIEWED". This allows both first-time reviews and re-reviews of already-reviewed phases. If no such phase exists, tell the user: "No executed phases waiting for review. Run /bee:execute-phase N first." Do NOT proceed.
Already reviewing guard: If the Status column for the detected phase shows "REVIEWING", warn the user: "Phase {N} review is in progress. Continue from where it left off?" Wait for explicit confirmation before proceeding. If the user declines, stop.

Step 2: Load Phase Context

Read STATE.md to find the Current Spec Path
Determine the phase number and slug from the Phases table
Find the phase directory using Glob: {spec-path}/phases/{NN}-*/ where NN is the zero-padded phase number. This avoids slug construction mismatches.
- TASKS.md: {phase_directory}/TASKS.md
- spec.md: {spec-path}/spec.md
Read TASKS.md to identify files created/modified by the phase
Note whether .bee/false-positives.md exists (Step 3.9 extracts false positives before review agents)
Check $ARGUMENTS for --loop flag
Read config.json from dynamic context for review.loop setting
Determine loop mode: enabled if --loop in arguments OR config.review.loop is true
Check the Reviewed column for the detected phase. If it shows "Yes (N)" for some number N, this is a re-review -- set the base iteration count to N. Otherwise (empty or no previous review), set the base iteration count to 0.
Initialize iteration counter to base iteration count + 1 (first review = 1, first re-review of "Yes (1)" = 2, etc.)

Step 3: Archive Previous Review (if re-review) and Update STATE.md

Read current .bee/STATE.md from disk (fresh read, not cached dynamic context).

3a. Archive previous REVIEW.md (re-review only):

If the detected phase has a Reviewed value of "Yes (N)" (i.e., it was previously reviewed):

Check if {phase_directory}/REVIEW.md exists on disk
If it exists, rename it to {phase_directory}/REVIEW-{N}.md where N is the iteration number extracted from "Yes (N)" (e.g., "Yes (1)" -> archive as REVIEW-1.md, "Yes (2)" -> archive as REVIEW-2.md)
Display: "Archived previous review as REVIEW-{N}.md"

If the phase has not been reviewed before (Reviewed column is empty), skip archival.

3b. Update STATE.md:

Set the phase row's Status to REVIEWING
Set Last Action to:
- Command: /bee:review
- Timestamp: current ISO 8601 timestamp
- Result: "Starting review of phase {N} (iteration {iteration_counter})"
Write updated STATE.md to disk

Display to user: "Starting review of Phase {N}: {phase-name} (iteration {iteration_counter})..."

Step 3.5: Build & Test Gate

Build check (automatic, per-stack):

For each stack in config.stacks, scoped to its path:

Check package.json for a build script within {stack.path} (run node -e "const p=require('./{stack.path}/package.json'); process.exit(p.scripts?.build ? 0 : 1)" via Bash). Also check composer.json if the stack is Laravel-based.
If a build script exists, run it via Bash scoped to the stack path:
- Node projects: cd {stack.path} && npm run build
- PHP projects: skip (no build step typically)
If build fails: display "Build: {stack.name} FAILED" with error output. Use AskUserQuestion: Question: "Build failed for {stack.name}. How to proceed?" Options: "Fix build errors first" (stop review, user fixes and re-runs), "Continue review anyway" (note build failure in context). Act on the user's choice.
If build passes: display "Build: {stack.name}: OK" and continue.
If no build script exists: display "Build: {stack.name}: skipped (no build script)" and continue.

Test check (user opt-in, per-stack):

Ask the user: "Run tests before review? (yes/no)"

If the user says yes: For each stack in config.stacks, resolve its test runner: read stacks[i].testRunner first, fall back to root config.testRunner if absent, then "none". Run each stack's test runner scoped to its path. Report per-stack: "Tests: {stack.name} ({runner}): {result}".

For each stack:

Resolve the test runner using the fallback chain above. If "none", display "Tests: {stack.name}: skipped (no test runner configured)" and continue to the next stack.
Detect the best parallel-capable test command:
- vitest: cd {stack.path} && npx vitest run (parallel by default via worker threads)
- jest: cd {stack.path} && npx jest (parallel by default via workers, use --maxWorkers=auto if not set)
- pest: cd {stack.path} && ./vendor/bin/pest --parallel (uses Paratest under the hood)
Run the detected test command via Bash (timeout: 5 minutes).
If tests pass: display "Tests: {stack.name} ({runner}): {count} passed" and continue.
If tests fail: display the failure summary. Use AskUserQuestion: Question: "Tests failed for {stack.name} ({fail_count} failures). How to proceed?" Options: "Fix test failures first" (stop, user fixes and re-runs), "Continue review anyway" (note failures in context). Act on the user's choice.

If the user says no: display "Tests: skipped" and continue.

Step 3.9: Extract False Positives

Before spawning review agents, extract documented false positives so each agent can exclude known non-issues:

Read .bee/false-positives.md using the Read tool.
If the file exists, build a formatted false-positives list from its contents. Extract each ## FP-NNN entry with its finding description, reason, and file reference. Format the list as:
```
EXCLUDE these documented false positives from your findings:
- FP-001: {summary} ({file}, {reason})
- FP-002: {summary} ({file}, {reason})
...
```
If the file does not exist, set the false-positives list to: "No documented false positives."
This formatted list is included verbatim in each agent's context packet in Step 4.

Step 3.95: Context Cache and Dependency Scan

Context Cache (read once, pass to all agents):

Before spawning any agents, read these files once and include their content in every agent's context packet:

Stack skill: plugins/bee/skills/stacks/{stack}/SKILL.md
Project context: .bee/CONTEXT.md
False positives: .bee/false-positives.md
User preferences: .bee/user.md

Pass these as part of the agent's prompt context — agents should NOT re-read these files themselves.

Dependency Scan:

Before spawning review agents, expand the file scope:

For each modified file, grep for import/require/use statements to find its dependencies (files it imports)
Grep the project for files that import/require any modified file to find its consumers (files that import it)
Scan depth: direct imports only (not transitive)
Test file discovery: For each modified file, look for corresponding test files using common patterns: {name}.test.{ext}, {name}.spec.{ext}, tests/{name}.{ext}, __tests__/{name}.{ext}. Include discovered test file paths in the context packet.
Limit: max 20 extra files (dependencies + consumers + test files combined) per agent context packet — if more than 20, prioritize consumers over dependencies
Include all expanded file paths in the agent's context packet alongside the modified files
Instruct agents: "Also verify that modifications don't break consumer files. Check import compatibility, return type changes, and side effect changes. Verify test files cover the modified behavior."

Step 4: STEP 1 -- REVIEW (spawn specialized agents)

Spawn specialized review agents. In a multi-stack project, bug-detector, pattern-reviewer, and stack-reviewer are spawned once per stack (3 per-stack agents), while plan-compliance-reviewer is spawned ONCE globally (stack-agnostic). Total agents: (3 x N) + 1 where N = number of stacks. For single-stack projects, N = 1 so exactly 4 agents are spawned (identical to original behavior). The command (not the agents) writes REVIEW.md after consolidating all findings from all stacks.

4.1: Determine stacks and build context packets

4.1a: Read stacks from config

Read config.stacks from config.json. Build the stack list:

If config.stacks exists and is an array: use it as-is. Each entry has name and path.
If config.stacks is absent but config.stack exists (legacy v2 config): create a single-entry list: [{ name: config.stack, path: "." }].
If neither exists: stop with error "No stack configured in config.json."

Also read config.implementation_mode (defaults to "quality" if absent).

4.1b: Build shared context base

Build a shared context base for all agents:

Spec path: {spec.md path}
TASKS.md path: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
False positives list: the formatted list from Step 3.9

4.1c: Build per-stack context packets

For each stack in the stacks list, build three agent-specific context packets. When the project has a single stack, this loop runs once and behavior is identical to the original four-agent approach.

Agent resolution (stack-specific fallback): For each per-stack agent, check if a stack-specific variant exists before using the generic agent. For each stack in the stacks list, resolve agents as follows:

Bug Detector: Check if plugins/bee/agents/stacks/{stack.name}/bug-detector.md exists. If yes, use {stack.name}-bug-detector as the agent name. If no, fallback to generic bee:bug-detector.
Pattern Reviewer: Check if plugins/bee/agents/stacks/{stack.name}/pattern-reviewer.md exists. If yes, use {stack.name}-pattern-reviewer as the agent name. If no, fallback to generic bee:pattern-reviewer.
Stack Reviewer: Check if plugins/bee/agents/stacks/{stack.name}/stack-reviewer.md exists. If yes, use {stack.name}-stack-reviewer as the agent name. If no, fallback to generic bee:stack-reviewer.

Generic agents remain the default for any stack that does not have dedicated stack-specific agents in plugins/bee/agents/stacks/{stack.name}/.

Per-stack Agent: Bug Detector (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for bugs and security issues.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}

{false-positives list from Step 3.9}

Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. Review those files for bugs, logic errors, null handling issues, race conditions, edge cases, and security vulnerabilities (OWASP). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill for project-specific conventions). Report only HIGH confidence findings in your standard output format.

Per-stack Agent: Pattern Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for pattern deviations.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}

{false-positives list from Step 3.9}

Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. For each file, find 2-3 similar existing files in the codebase, extract their patterns, and compare. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report only HIGH confidence deviations in your standard output format.

Per-stack Agent: Stack Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for stack best practice violations.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}

{false-positives list from Step 3.9}

The stack for this review pass is `{stack.name}`. Load the stack skill at `skills/stacks/{stack.name}/SKILL.md` and check all code within the `{stack.path}` directory against that stack's conventions. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill). Use Context7 to verify framework best practices. Report only HIGH confidence violations in your standard output format.

4.1d: Build global context packet (spawned ONCE, not per-stack)

Before building the packet, check if {spec-path}/requirements.md exists on disk. Set the requirements line:

If found: Requirements: {spec-path}/requirements.md
If not found: Requirements: (not found -- skip requirement tracking)

Global Agent: Plan Compliance Reviewer (bee:plan-compliance-reviewer) -- model set in 4.2 by implementation_mode -- spawned ONCE globally

You are reviewing Phase {N} implementation in CODE REVIEW MODE (not plan review mode).

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Requirements: {spec-path}/requirements.md OR (not found -- skip requirement tracking)
Phase directory: {phase_directory}
Phase number: {N}

{false-positives list from Step 3.9}

Review mode: code review. Check implemented code against spec requirements and acceptance criteria. Verify every acceptance criterion in TASKS.md has corresponding implementation. Check for missing features, incorrect behavior, and over-scope additions. If phase > 1, also check cross-phase integration (imports, data contracts, workflow connections, shared state). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report findings in your standard code review mode output format.

4.2: Spawn agents

The total number of agents is (3 x N) + 1 where N is the number of stacks. For a single-stack project this is exactly 4.

Economy mode (implementation_mode: "economy"): Pass model: "sonnet" for all agents. Spawn agents sequentially per stack to reduce token usage:

Spawn the global plan-compliance-reviewer first (single Task tool call, model: "sonnet"). Wait for it to complete.
For each stack in order: spawn that stack's 3 per-stack agents (bug-detector, pattern-reviewer, stack-reviewer) via three Task tool calls in a single message (parallel within the stack, all model: "sonnet"). Wait for all three to complete before proceeding to the next stack. In economy mode with a single stack, this results in the same 4 agents but spawned in two sequential batches instead of one parallel batch.

Quality or Premium mode (default "quality", or "premium"): Spawn ALL agents (all per-stack agents + the global plan-compliance-reviewer) via Task tool calls in a SINGLE message (parallel execution). Omit the model parameter for all agents (they inherit the parent model) -- quality/premium mode uses the stronger model for deeper, more thorough review analysis. Wait for all agents to complete.

Wait for all agents to complete before proceeding.

4.3: Parse findings from each agent

After all agents complete, parse findings from each agent's final message. Each agent has a distinct output format -- normalize all findings into a unified list. Findings from all stacks are combined into a single consolidated list:

Bug Detector findings (from ## Bugs Detected section):

Each - **[Bug type]:** [Description] - \file:line`` entry becomes one finding
Severity: taken from the Critical/High/Medium subsection the entry appears under
Category: "Bug" (or "Security" if the bug type mentions security, injection, XSS, CSRF, auth, or access control)

Pattern Reviewer findings (from ## Project Pattern Deviations section):

Each - **[Pattern type]:** [Deviation description] - \file:line`` entry becomes one finding
Severity: Medium (pattern deviations default to Medium)
Category: "Pattern"

Plan Compliance Reviewer findings (from ## Plan Compliance Findings section):

SG-NNN entries (Spec Gap) -> Category: "Spec Gap", severity from the entry
CI-NNN entries (Cross-Phase Integration) -> Category: "Spec Gap", severity from the entry
OS-NNN entries (Over-Scope) -> Category: "Spec Gap", severity: Medium

Stack Reviewer findings (from ## Stack Best Practice Violations section):

Each - **[Rule category]:** [Violation description] - \file:line`` entry becomes one finding
Severity: Medium (stack violations default to Medium)
Category: "Standards"

If an agent reports no findings (e.g., "No bugs detected.", "No project pattern deviations found.", etc.), it contributes zero findings.

4.4: Deduplicate and merge

For each pair of findings from different agents, check if they reference the same file AND their line ranges overlap (within 5 lines of each other). If so, merge them:

Keep the higher severity (Critical > High > Medium)
Combine categories (e.g., "Bug, Standards")
Combine descriptions (concatenate with "; " separator)
Use the broader line range

4.5: Assign IDs and write REVIEW.md

Assign sequential IDs to all merged findings: F-001, F-002, F-003, ...
Write {phase_directory}/REVIEW.md using the review-report template (skills/core/templates/review-report.md):
- Fill in the Summary section (spec name, phase number, date, iteration, status: PENDING)
- Fill in the Counts tables (by severity and by category)
- Write each finding as a ### F-NNN section with: Severity, Category, File, Lines, Description, Suggested Fix, Validation: pending, Fix Status: pending
- Leave the False Positives section empty
- Leave the Fix Summary table with one row per finding, all showing "pending"
Verify REVIEW.md was written by reading it back with the Read tool.

4.6: Evaluate findings

Count total findings, count by severity (critical, high, medium), count by category.
If 0 findings after consolidation:
- Read current STATE.md from disk
- Set Reviewed column to "Yes ({iteration_counter})" where iteration_counter is the current cumulative iteration count
- Set Status to REVIEWED
- Set Last Action result to "Phase {N} reviewed (iteration {iteration_counter}): 0 findings -- clean code"
- Write STATE.md to disk
- Display: "Review complete -- clean code! No findings (iteration {iteration_counter})."
- Skip to Step 8 (completion).
Display findings summary: "{N} findings from {agent_count} reviewers ({stack_count} stacks): {critical} critical, {high} high, {medium} medium" (for single-stack, omit the stacks part: "{N} findings from 4 reviewers: {critical} critical, {high} high, {medium} medium")
If more than 10 findings: present the list to user before proceeding: "The review found {N} findings (above typical range). Review the list in REVIEW.md and confirm you want to proceed with validation." Wait for user confirmation. If user declines, stop.

Step 5: STEP 2 -- VALIDATE EACH FINDING (spawn finding-validator agents)

For each finding in REVIEW.md (parsed from the ### F-NNN sections):
- Build validation context: finding ID, summary, severity, category, file path, line range, description, suggested fix, and source_agent (the specialist agent that originally produced the finding -- determined by category mapping: Bug/Security -> bug-detector, Pattern -> pattern-reviewer, Spec Gap -> plan-compliance-reviewer, Standards -> stack-reviewer)
- Spawn finding-validator agent via Task tool and the finding context. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model (inherit parent) -- finding validation is critical classification work
- Multiple validators CAN be spawned in parallel (they are read-only and independent)
- Batch up to 5 validators at a time to avoid overwhelming the system
Collect classifications from each validator's final message (the ## Classification section with Finding, Verdict, Confidence, Source Agent, and Reason fields)
Escalate MEDIUM confidence classifications to specialist agents for a second opinion:
- Filter the collected classifications: separate HIGH confidence (proceed unchanged) from MEDIUM confidence (need escalation)
- For each MEDIUM confidence classification, spawn a fresh finding-validator agent for a second opinion (NOT the source specialist — specialist agents have SubagentStop hooks that expect their standard output format, not the escalation format). Spawn via Task tool. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model. Provide this context packet:
```
You are providing a second opinion on a review finding that received an uncertain classification.

## Original Finding
- **ID:** F-{NNN}
- **Severity:** {severity}
- **Category:** {category}
- **File:** {file_path}
- **Lines:** {line_range}
- **Description:** {description}
- **Suggested Fix:** {suggested_fix}

## Validator Classification
- **Verdict:** {verdict}
- **Confidence:** MEDIUM
- **Reason:** {validator_reason}

## Your Task
Provide a second opinion on whether this finding is valid. Read the file and surrounding context. Respond with your verdict: REAL BUG or FALSE POSITIVE, followed by your reasoning.

End your response with your standard classification format:
## Classification
- **Finding:** F-{NNN}
- **Verdict:** {REAL BUG | FALSE POSITIVE}
- **Confidence:** HIGH
- **Source Agent:** {source_agent from original finding}
- **Reason:** {your reasoning for this second opinion}
```
- Specialist escalations are spawned SEQUENTIALLY (one at a time) -- each is a focused re-analysis
- After the finding-validator responds, parse the ## Classification section from its final message
- Use the specialist's verdict as the FINAL classification, overriding the validator's uncertain MEDIUM confidence classification
- If the specialist confirms REAL BUG: the finding stays with verdict REAL BUG
- If the specialist says FALSE POSITIVE: the finding's verdict becomes FALSE POSITIVE
- Record the escalation: append " (Escalated to {source_agent} -- reclassified as {verdict})" to the finding's Validation field in REVIEW.md (e.g., "FALSE POSITIVE (Escalated to bug-detector -- reclassified as FALSE POSITIVE)" or "REAL BUG (Escalated to pattern-reviewer -- reclassified as REAL BUG)")
- Display each escalation: "Escalated F-{NNN} to {source_agent} -- reclassified as {verdict}"
Read current REVIEW.md from disk (fresh read -- another validator batch may have been processed). Update REVIEW.md:
- Set each finding's Validation field to the final classification:
  - HIGH confidence findings: the validator's verdict (REAL BUG / FALSE POSITIVE / STYLISTIC)
  - Escalated MEDIUM confidence findings: the specialist's verdict with escalation note (e.g., "REAL BUG (Escalated to bug-detector -- reclassified as REAL BUG)")
- Update the Counts table with classification breakdown
Handle FALSE POSITIVE findings (including those reclassified by specialist escalation):
- If .bee/false-positives.md does not exist, create it with a # False Positives header
- Read .bee/false-positives.md, count the number of existing ## FP- headings, set the next FP number to count + 1
- For each FALSE POSITIVE finding, append an entry (incrementing the FP number for each):
```
## FP-{NNN}: {one-line summary}
- **Finding:** {original finding description from REVIEW.md}
- **Reason:** {validator's reason for FALSE POSITIVE classification}
- **Phase:** {phase number}
- **Date:** {current ISO 8601 date}
```
- For findings reclassified as FALSE POSITIVE via specialist escalation, include the specialist's reason (not the validator's) in the Reason field
- Update REVIEW.md: set the finding's Fix Status to "False Positive"
Handle STYLISTIC findings (user interaction):
- For each STYLISTIC finding, use AskUserQuestion: Question: "STYLISTIC finding: F-{NNN} -- '{summary}'. What to do?" Options: "Fix it" (add to confirmed fix list), "Ignore" (mark as Skipped in REVIEW.md), "False Positive" (persist to false-positives.md, won't be flagged again).
- Act on the user's choice for each STYLISTIC finding:
  - Fix it: add finding to the confirmed fix list
  - Ignore: mark as "Skipped (user ignored)" in REVIEW.md Fix Status
  - False Positive: append to .bee/false-positives.md (same format as step 5) and mark as "False Positive" in REVIEW.md
Build confirmed fix list: all REAL BUG findings (both HIGH confidence and specialist-confirmed) + user-approved STYLISTIC findings (those where user chose option a). Exclude any findings reclassified as FALSE POSITIVE by specialist escalation.
Display validation summary: "{real_bug} real bugs, {false_positive} false positives, {stylistic} stylistic ({user_fix} to fix, {user_ignore} ignored), {escalated} escalated ({escalated_real_bug} confirmed, {escalated_false_positive} reclassified as FP)"

Step 6: STEP 3 -- FIX CONFIRMED ISSUES (spawn fixer agents with file-based parallelism)

Sort confirmed findings by priority order:
- Priority 1: Critical severity
- Priority 2: High severity
- Priority 3: Standards category (Medium)
- Priority 4: Dead Code category (Medium)
- Priority 5: Other Medium severity
If no confirmed findings (all were false positives, ignored, or skipped): display "No confirmed findings to fix -- all findings were classified as false positives or stylistic (ignored)." Update STATE.md and skip to Step 8.

Fixer Parallelization Strategy:

Group confirmed findings by file path
For findings on DIFFERENT files: spawn fixers in parallel (one fixer per file group, processing its findings)
For findings on the SAME file: run fixers sequentially within the group (safety — each fix changes file state)
Collect all results, update review file with fix status

Example: 6 findings on 3 files → 3 parallel fixer groups (instead of 6 sequential).

For EACH file group (parallel across groups, sequential within each group):
- Build fixer context packet:
  - Finding details: ID, summary, severity, category, file path, line range, description, suggested fix
  - Validation classification: REAL BUG or STYLISTIC (user-approved)
  - Stack info: resolve the correct stack for the finding's file path using path-overlap logic (compare the finding's file path against each stack's path in config.stacks -- a file matches a stack if the file path starts with or is within the stack's path; "." matches everything). Pass the resolved stack name explicitly: "Stack: {resolved-stack-name}. Load the stack skill at skills/stacks/{resolved-stack-name}/SKILL.md." If only one stack is configured, use it directly.
- Spawn fixer agent via Task tool with the context packet. Use the parent model (omit model parameter) -- fixers write production code and need full reasoning.
- For findings on the same file: WAIT for each fixer to complete before spawning the next within that group. For findings on different files: fixer groups run in parallel.
- Read the fixer's fix report from its final message (## Fix Report section)
- Read current REVIEW.md from disk (fresh read -- Read-Modify-Write pattern)
- Update REVIEW.md: set Fix Status for this finding to the fixer's reported status (Fixed / Reverted / Failed)
- Write updated REVIEW.md to disk
- If fixer reports "Reverted" or "Failed" (tests broke and changes were reverted):
  - Display failure to user: "Fix for F-{NNN} failed -- tests broke after fix. Changes reverted. Skipping this finding."
  - Update REVIEW.md Fix Status to "Skipped (tests failed)"

CRITICAL: Within the same file group, spawn fixers SEQUENTIALLY, one at a time. Never spawn multiple fixers for the same file in parallel. One fix may change the context for the next finding on that file. Cross-file fixer groups may run in parallel safely.

After all confirmed findings have been processed, display fix summary: "{fixed} fixed, {skipped} skipped, {failed} failed out of {total} confirmed findings"

Step 7: STEP 4 -- RE-REVIEW (if loop mode enabled)

If loop mode is NOT enabled: skip to Step 8 (completion)
Track loop iterations separately from the cumulative iteration counter. Initialize $LOOP_ITERATION = 1 on first entry to Step 7 (do NOT re-initialize on subsequent loops). Increment $LOOP_ITERATION on each re-entry. Also increment the cumulative iteration_counter (used for STATE.md and REVIEW.md naming).
Display: "Starting re-review (loop iteration {$LOOP_ITERATION}, cumulative iteration {iteration_counter})..."

7.1: Archive current REVIEW.md

Before the re-review overwrites REVIEW.md, archive the current one:

Compute the previous iteration number: current iteration counter minus 1 (this is the iteration that produced the current REVIEW.md)
Rename {phase_directory}/REVIEW.md to {phase_directory}/REVIEW-{previous_iteration}.md
Display: "Archived previous review as REVIEW-{previous_iteration}.md"

7.2: Re-extract false positives

Re-run the Step 3.9 false-positive extraction. The .bee/false-positives.md file now includes any FPs documented during the previous iteration's validation step:

Read .bee/false-positives.md using the Read tool
If the file exists, build the updated formatted false-positives list (same format as Step 3.9)
If the file does not exist, set the false-positives list to: "No documented false positives."

7.3: Spawn review agents (same multi-stack logic as Step 4)

Apply the same multi-stack spawning logic as Step 4. Rebuild context packets using Step 4.1 (same stacks list, same per-stack and global agent structure) but with the refreshed false-positives list from Step 7.2. The agents review the updated code (including all fixes applied in previous iterations).

Per-stack agents (bug-detector, pattern-reviewer, stack-reviewer): one set per stack, same context packets as Step 4.1c with updated false-positives
Global agent (plan-compliance-reviewer): spawned ONCE, same context packet as Step 4.1d with updated false-positives

Spawn using the same economy/quality/premium mode logic as Step 4.2. Wait for all agents to complete.

7.4: Parse, deduplicate, and write new REVIEW.md

Apply the same consolidation and deduplication logic as Steps 4.3 through 4.5:

Parse findings from each agent's final message using the same category/severity mapping as Step 4.3
Deduplicate and merge overlapping findings using the same rules as Step 4.4 (same file + line ranges within 5 lines -> merge with higher severity, combined categories/descriptions)
Assign sequential IDs (F-001, F-002, ...) and write the new {phase_directory}/REVIEW.md using the review-report template, with the iteration number set to the current iteration counter in the Summary section

7.5: Evaluate re-review findings

Count total findings after consolidation
If 0 new findings after consolidation:
- Display: "Re-review clean -- no new findings after iteration {counter}."
- Skip to Step 8 (completion)
If new findings found:
- Display: "Re-review found {N} new findings. Validating and fixing..."
- Repeat from Step 5 (validate findings -> fix confirmed issues -> check for another loop iteration at Step 7)

Step 8: Completion

After all steps complete (or early exit from clean review):

Update STATE.md:
- Reviewed column: "Yes ({iteration_counter})" where iteration_counter is the cumulative review iteration count (first review = 1, first re-review = 2, etc.)
- Status: REVIEWED
- Last Action:
  - Command: /bee:review
  - Timestamp: current ISO 8601 timestamp
  - Result: "Phase {N} reviewed (iteration {iteration_counter}): {total_findings} findings, {confirmed} confirmed, {fixed} fixed, {false_positives} false positives"
Write updated STATE.md to disk
Display completion summary:

Phase {N} reviewed!

Phase: {phase-name}
Findings: {total} total
- Real bugs: {confirmed} ({fixed} fixed, {failed} failed)
- False positives: {fp_count} (documented in .bee/false-positives.md)
- Stylistic: {stylistic} ({user_fixed} fixed, {user_ignored} ignored)
Iterations: {iteration_count}

Use AskUserQuestion to let the user choose:

AskUserQuestion(
  question: "Review phase {N} complet. [X] findings: [F] fixed, [S] skipped, [FP] false positives.",
  options: ["Re-review", "Accept", "Testing", "Custom"]
)

Re-review: Re-run from Step 1. No iteration limit — user decides when clean.
Accept: End review, update STATE.md
Testing: Proceed to /bee:test
Custom: Free text

Design Notes (do not display to user):

The command auto-detects the phase to review (last EXECUTED or REVIEWED phase), or accepts an explicit --phase N argument to target a specific phase. Re-reviewing an already-reviewed phase is allowed -- the previous REVIEW.md is archived as REVIEW-{N}.md where N is the previous iteration number, and the iteration counter increments.
In multi-stack projects, bug-detector, pattern-reviewer, and stack-reviewer are spawned once per stack (3 per-stack agents) while plan-compliance-reviewer is spawned ONCE globally (stack-agnostic). Total: (3 x N) + 1 agents where N = number of stacks. For single-stack projects this is exactly 4 agents, identical to the original behavior. Model tier depends on implementation_mode: quality/premium mode omits model (inherits parent for deeper analysis); economy mode passes model: "sonnet" and spawns agents sequentially per stack to reduce token usage.
The command (not the agents) writes REVIEW.md. Agents report findings in their own output formats; the command normalizes, deduplicates, and writes the unified REVIEW.md.
Step 3.9 extracts false positives BEFORE spawning agents. Each agent receives the formatted false-positives list in its context packet so it can self-filter. The command does NOT need to post-filter.
The plan-compliance-reviewer operates in "code review mode" (not plan review mode). The context packet explicitly states this.
Deduplication merges findings from different agents when they reference the same file AND line ranges overlap within 5 lines. Higher severity is kept, categories and descriptions are combined.
REVIEW.md is the pipeline state, progressively updated as validation and fixing proceed. Analogous to TASKS.md checkboxes in execute-phase.
Finding validation can be parallel (up to 5 at a time). Fixing uses file-based parallelism: fixers for different files run in parallel; fixers for the same file run sequentially to prevent conflicts.
Specialist escalation for MEDIUM confidence findings happens AFTER batch validation completes (not inline during validation batching). Flow: (1) batch validate up to 5 findings, (2) collect all classifications, (3) for MEDIUM confidence ones, spawn the source specialist sequentially for a second opinion, (4) then proceed to update REVIEW.md with final classifications. Escalation uses bee:finding-validator (not the source specialist — specialist SubagentStop hooks expect their standard format, not second-opinion format). HIGH confidence classifications proceed unchanged -- only MEDIUM triggers escalation.
The command handles user interaction for STYLISTIC findings. Commands handle interaction, agents handle work.
.bee/false-positives.md is created on first use when the first false positive is documented. If no false positives exist yet, the file does not exist.
Loop mode is opt-in: --loop flag or config.review.loop. No hardcoded iteration cap — the user decides when clean via the interactive menu at Step 8. Re-review (Step 7) re-extracts false positives (Step 7.2), re-spawns all review agents in parallel (Step 7.3), and applies the same parse/deduplicate/consolidate pipeline (Step 7.4) before evaluating findings. The re-review agents see the updated code (post-fix) and updated false-positives list.
Always re-read STATE.md from disk before each update (Read-Modify-Write pattern) to ensure latest state.
The review agents, finding-validator, and fixer are spawned via Task tool as foreground subagents. The SubagentStop hook in hooks.json fires for implementer agents only (matcher: "implementer") -- it does NOT fire for review pipeline agents.
If the session ends mid-review (context limit, crash, user stops), re-running /bee:review detects the REVIEWING status and offers to resume. REVIEW.md on disk reflects the pipeline state at the time of interruption.
Token usage is approximately (3N + 1)x that of the previous single-reviewer approach due to per-stack parallel sessions (where N = number of stacks). For single-stack projects this is 4x. The tradeoff is more focused, higher-quality findings from domain specialists. Economy mode reduces peak token usage by serializing per-stack batches.
Per-stack agents (bug-detector, pattern-reviewer, stack-reviewer) support stack-specific variants. If plugins/bee/agents/stacks/{stack.name}/{role}.md exists, the stack-specific agent is used (e.g., laravel-inertia-vue-bug-detector); otherwise the generic bee:{role} agent is the fallback. This allows stacks to override review agents with domain-specific instructions while generic agents remain the default for stacks without dedicated agents.

Other plugins with /review

/review

Runs Codex code review on local git state (working tree or vs base branch). Supports --wait/--background, --base <ref>, --scope auto|working-tree|branch.

ReadGlobGrep+3

codex

11.4k

/review

commands/

Reviews staged changes or recent commits across five axes—correctness, readability, architecture, security, performance—producing categorized findings with file:line references and fixes.

agent-skills

3.8k

/review

Reviews HTML file for design anti-patterns, principles violations, and accessibility issues. Generates markdown report with status tables and recommendations.

ReadGlobGrep+2

frontend-design-pro

2.7k

/review

Reviews specified code scope via four specialists (quality, security, performance, architecture), producing summary, detailed findings, refactoring suggestions, prioritized action plan.

essentials

2.4k

/review

commands/

Performs expert multi-LLM code review with inline PR comments on staged changes, open PRs, working tree, or paths. Checks LLM providers and queries focus areas.

octo

2.4k

/review

Dispatches the reviewer agent to review current branch code changes against code quality principles.

frontend-fundamentals

1.9k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 19, 2026

Actions

View Source View Plugin View on GitHub View README

Current State (load before proceeding)

Read these files using the Read tool:

.bee/STATE.md — if not found: NOT_INITIALIZED
.bee/config.json — if not found: use {}

Instructions

Step 1: Validation Guards

Check these guards in order. Stop immediately if any fails:

NOT_INITIALIZED guard: If the dynamic context above contains "NOT_INITIALIZED" (meaning .bee/STATE.md does not exist), tell the user: "BeeDev is not initialized. Run /bee:init first." Do NOT proceed.
NO_SPEC guard: Read STATE.md from the dynamic context above. If no Current Spec Path exists or it shows "(none)", tell the user: "No spec found. Run /bee:new-spec first." Do NOT proceed.
Phase detection: Check $ARGUMENTS for a --phase N flag. If present, use phase N explicitly. Validate: if phase N does not exist in the Phases table, tell the user: "Phase {N} does not exist. Your spec has {M} phases." Do NOT proceed. If the explicit phase's Status is not "EXECUTED" or "REVIEWED", tell the user: "Phase {N} has status {status} -- expected EXECUTED or REVIEWED for review." Do NOT proceed. If --phase N is not present, read the Phases table from STATE.md. Find the last phase where: Status is "EXECUTED" or "REVIEWED". This allows both first-time reviews and re-reviews of already-reviewed phases. If no such phase exists, tell the user: "No executed phases waiting for review. Run /bee:execute-phase N first." Do NOT proceed.
Already reviewing guard: If the Status column for the detected phase shows "REVIEWING", warn the user: "Phase {N} review is in progress. Continue from where it left off?" Wait for explicit confirmation before proceeding. If the user declines, stop.

Step 2: Load Phase Context

Read STATE.md to find the Current Spec Path
Determine the phase number and slug from the Phases table
Find the phase directory using Glob: {spec-path}/phases/{NN}-*/ where NN is the zero-padded phase number. This avoids slug construction mismatches.
- TASKS.md: {phase_directory}/TASKS.md
- spec.md: {spec-path}/spec.md
Read TASKS.md to identify files created/modified by the phase
Note whether .bee/false-positives.md exists (Step 3.9 extracts false positives before review agents)
Check $ARGUMENTS for --loop flag
Read config.json from dynamic context for review.loop setting
Determine loop mode: enabled if --loop in arguments OR config.review.loop is true
Check the Reviewed column for the detected phase. If it shows "Yes (N)" for some number N, this is a re-review -- set the base iteration count to N. Otherwise (empty or no previous review), set the base iteration count to 0.
Initialize iteration counter to base iteration count + 1 (first review = 1, first re-review of "Yes (1)" = 2, etc.)

Step 3: Archive Previous Review (if re-review) and Update STATE.md

Read current .bee/STATE.md from disk (fresh read, not cached dynamic context).

3a. Archive previous REVIEW.md (re-review only):

If the detected phase has a Reviewed value of "Yes (N)" (i.e., it was previously reviewed):

Check if {phase_directory}/REVIEW.md exists on disk
If it exists, rename it to {phase_directory}/REVIEW-{N}.md where N is the iteration number extracted from "Yes (N)" (e.g., "Yes (1)" -> archive as REVIEW-1.md, "Yes (2)" -> archive as REVIEW-2.md)
Display: "Archived previous review as REVIEW-{N}.md"

If the phase has not been reviewed before (Reviewed column is empty), skip archival.

3b. Update STATE.md:

Set the phase row's Status to REVIEWING
Set Last Action to:
- Command: /bee:review
- Timestamp: current ISO 8601 timestamp
- Result: "Starting review of phase {N} (iteration {iteration_counter})"
Write updated STATE.md to disk

Display to user: "Starting review of Phase {N}: {phase-name} (iteration {iteration_counter})..."

Step 3.5: Build & Test Gate

Build check (automatic, per-stack):

For each stack in config.stacks, scoped to its path:

Check package.json for a build script within {stack.path} (run node -e "const p=require('./{stack.path}/package.json'); process.exit(p.scripts?.build ? 0 : 1)" via Bash). Also check composer.json if the stack is Laravel-based.
If a build script exists, run it via Bash scoped to the stack path:
- Node projects: cd {stack.path} && npm run build
- PHP projects: skip (no build step typically)
If build fails: display "Build: {stack.name} FAILED" with error output. Use AskUserQuestion: Question: "Build failed for {stack.name}. How to proceed?" Options: "Fix build errors first" (stop review, user fixes and re-runs), "Continue review anyway" (note build failure in context). Act on the user's choice.
If build passes: display "Build: {stack.name}: OK" and continue.
If no build script exists: display "Build: {stack.name}: skipped (no build script)" and continue.

Test check (user opt-in, per-stack):

Ask the user: "Run tests before review? (yes/no)"

For each stack:

Resolve the test runner using the fallback chain above. If "none", display "Tests: {stack.name}: skipped (no test runner configured)" and continue to the next stack.
Detect the best parallel-capable test command:
- vitest: cd {stack.path} && npx vitest run (parallel by default via worker threads)
- jest: cd {stack.path} && npx jest (parallel by default via workers, use --maxWorkers=auto if not set)
- pest: cd {stack.path} && ./vendor/bin/pest --parallel (uses Paratest under the hood)
Run the detected test command via Bash (timeout: 5 minutes).
If tests pass: display "Tests: {stack.name} ({runner}): {count} passed" and continue.
If tests fail: display the failure summary. Use AskUserQuestion: Question: "Tests failed for {stack.name} ({fail_count} failures). How to proceed?" Options: "Fix test failures first" (stop, user fixes and re-runs), "Continue review anyway" (note failures in context). Act on the user's choice.

If the user says no: display "Tests: skipped" and continue.

Step 3.9: Extract False Positives

Before spawning review agents, extract documented false positives so each agent can exclude known non-issues:

Read .bee/false-positives.md using the Read tool.
If the file exists, build a formatted false-positives list from its contents. Extract each ## FP-NNN entry with its finding description, reason, and file reference. Format the list as:
```
EXCLUDE these documented false positives from your findings:
- FP-001: {summary} ({file}, {reason})
- FP-002: {summary} ({file}, {reason})
...
```
If the file does not exist, set the false-positives list to: "No documented false positives."
This formatted list is included verbatim in each agent's context packet in Step 4.

Step 3.95: Context Cache and Dependency Scan

Context Cache (read once, pass to all agents):

Before spawning any agents, read these files once and include their content in every agent's context packet:

Stack skill: plugins/bee/skills/stacks/{stack}/SKILL.md
Project context: .bee/CONTEXT.md
False positives: .bee/false-positives.md
User preferences: .bee/user.md

Pass these as part of the agent's prompt context — agents should NOT re-read these files themselves.

Dependency Scan:

Before spawning review agents, expand the file scope:

For each modified file, grep for import/require/use statements to find its dependencies (files it imports)
Grep the project for files that import/require any modified file to find its consumers (files that import it)
Scan depth: direct imports only (not transitive)
Test file discovery: For each modified file, look for corresponding test files using common patterns: {name}.test.{ext}, {name}.spec.{ext}, tests/{name}.{ext}, __tests__/{name}.{ext}. Include discovered test file paths in the context packet.
Limit: max 20 extra files (dependencies + consumers + test files combined) per agent context packet — if more than 20, prioritize consumers over dependencies
Include all expanded file paths in the agent's context packet alongside the modified files
Instruct agents: "Also verify that modifications don't break consumer files. Check import compatibility, return type changes, and side effect changes. Verify test files cover the modified behavior."

Step 4: STEP 1 -- REVIEW (spawn specialized agents)

4.1: Determine stacks and build context packets

4.1a: Read stacks from config

Read config.stacks from config.json. Build the stack list:

If config.stacks exists and is an array: use it as-is. Each entry has name and path.
If config.stacks is absent but config.stack exists (legacy v2 config): create a single-entry list: [{ name: config.stack, path: "." }].
If neither exists: stop with error "No stack configured in config.json."

Also read config.implementation_mode (defaults to "quality" if absent).

4.1b: Build shared context base

Build a shared context base for all agents:

Spec path: {spec.md path}
TASKS.md path: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
False positives list: the formatted list from Step 3.9

4.1c: Build per-stack context packets

For each stack in the stacks list, build three agent-specific context packets. When the project has a single stack, this loop runs once and behavior is identical to the original four-agent approach.

Bug Detector: Check if plugins/bee/agents/stacks/{stack.name}/bug-detector.md exists. If yes, use {stack.name}-bug-detector as the agent name. If no, fallback to generic bee:bug-detector.
Pattern Reviewer: Check if plugins/bee/agents/stacks/{stack.name}/pattern-reviewer.md exists. If yes, use {stack.name}-pattern-reviewer as the agent name. If no, fallback to generic bee:pattern-reviewer.
Stack Reviewer: Check if plugins/bee/agents/stacks/{stack.name}/stack-reviewer.md exists. If yes, use {stack.name}-stack-reviewer as the agent name. If no, fallback to generic bee:stack-reviewer.

Generic agents remain the default for any stack that does not have dedicated stack-specific agents in plugins/bee/agents/stacks/{stack.name}/.

Per-stack Agent: Bug Detector (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for bugs and security issues.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}

{false-positives list from Step 3.9}

Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. Review those files for bugs, logic errors, null handling issues, race conditions, edge cases, and security vulnerabilities (OWASP). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill for project-specific conventions). Report only HIGH confidence findings in your standard output format.

Per-stack Agent: Pattern Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for pattern deviations.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}

{false-positives list from Step 3.9}

Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. For each file, find 2-3 similar existing files in the codebase, extract their patterns, and compare. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report only HIGH confidence deviations in your standard output format.

Per-stack Agent: Stack Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack

You are reviewing Phase {N} implementation for stack best practice violations.

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}

{false-positives list from Step 3.9}

The stack for this review pass is `{stack.name}`. Load the stack skill at `skills/stacks/{stack.name}/SKILL.md` and check all code within the `{stack.path}` directory against that stack's conventions. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill). Use Context7 to verify framework best practices. Report only HIGH confidence violations in your standard output format.

4.1d: Build global context packet (spawned ONCE, not per-stack)

Before building the packet, check if {spec-path}/requirements.md exists on disk. Set the requirements line:

If found: Requirements: {spec-path}/requirements.md
If not found: Requirements: (not found -- skip requirement tracking)

Global Agent: Plan Compliance Reviewer (bee:plan-compliance-reviewer) -- model set in 4.2 by implementation_mode -- spawned ONCE globally

You are reviewing Phase {N} implementation in CODE REVIEW MODE (not plan review mode).

Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Requirements: {spec-path}/requirements.md OR (not found -- skip requirement tracking)
Phase directory: {phase_directory}
Phase number: {N}

{false-positives list from Step 3.9}

Review mode: code review. Check implemented code against spec requirements and acceptance criteria. Verify every acceptance criterion in TASKS.md has corresponding implementation. Check for missing features, incorrect behavior, and over-scope additions. If phase > 1, also check cross-phase integration (imports, data contracts, workflow connections, shared state). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report findings in your standard code review mode output format.

4.2: Spawn agents

The total number of agents is (3 x N) + 1 where N is the number of stacks. For a single-stack project this is exactly 4.

Economy mode (implementation_mode: "economy"): Pass model: "sonnet" for all agents. Spawn agents sequentially per stack to reduce token usage:

Spawn the global plan-compliance-reviewer first (single Task tool call, model: "sonnet"). Wait for it to complete.
For each stack in order: spawn that stack's 3 per-stack agents (bug-detector, pattern-reviewer, stack-reviewer) via three Task tool calls in a single message (parallel within the stack, all model: "sonnet"). Wait for all three to complete before proceeding to the next stack. In economy mode with a single stack, this results in the same 4 agents but spawned in two sequential batches instead of one parallel batch.

Wait for all agents to complete before proceeding.

4.3: Parse findings from each agent

Bug Detector findings (from ## Bugs Detected section):

Each - **[Bug type]:** [Description] - \file:line`` entry becomes one finding
Severity: taken from the Critical/High/Medium subsection the entry appears under
Category: "Bug" (or "Security" if the bug type mentions security, injection, XSS, CSRF, auth, or access control)

Pattern Reviewer findings (from ## Project Pattern Deviations section):

Each - **[Pattern type]:** [Deviation description] - \file:line`` entry becomes one finding
Severity: Medium (pattern deviations default to Medium)
Category: "Pattern"

Plan Compliance Reviewer findings (from ## Plan Compliance Findings section):

SG-NNN entries (Spec Gap) -> Category: "Spec Gap", severity from the entry
CI-NNN entries (Cross-Phase Integration) -> Category: "Spec Gap", severity from the entry
OS-NNN entries (Over-Scope) -> Category: "Spec Gap", severity: Medium

Stack Reviewer findings (from ## Stack Best Practice Violations section):

Each - **[Rule category]:** [Violation description] - \file:line`` entry becomes one finding
Severity: Medium (stack violations default to Medium)
Category: "Standards"

If an agent reports no findings (e.g., "No bugs detected.", "No project pattern deviations found.", etc.), it contributes zero findings.

4.4: Deduplicate and merge

For each pair of findings from different agents, check if they reference the same file AND their line ranges overlap (within 5 lines of each other). If so, merge them:

Keep the higher severity (Critical > High > Medium)
Combine categories (e.g., "Bug, Standards")
Combine descriptions (concatenate with "; " separator)
Use the broader line range

4.5: Assign IDs and write REVIEW.md

Assign sequential IDs to all merged findings: F-001, F-002, F-003, ...
Write {phase_directory}/REVIEW.md using the review-report template (skills/core/templates/review-report.md):
- Fill in the Summary section (spec name, phase number, date, iteration, status: PENDING)
- Fill in the Counts tables (by severity and by category)
- Write each finding as a ### F-NNN section with: Severity, Category, File, Lines, Description, Suggested Fix, Validation: pending, Fix Status: pending
- Leave the False Positives section empty
- Leave the Fix Summary table with one row per finding, all showing "pending"
Verify REVIEW.md was written by reading it back with the Read tool.

4.6: Evaluate findings

Count total findings, count by severity (critical, high, medium), count by category.
If 0 findings after consolidation:
- Read current STATE.md from disk
- Set Reviewed column to "Yes ({iteration_counter})" where iteration_counter is the current cumulative iteration count
- Set Status to REVIEWED
- Set Last Action result to "Phase {N} reviewed (iteration {iteration_counter}): 0 findings -- clean code"
- Write STATE.md to disk
- Display: "Review complete -- clean code! No findings (iteration {iteration_counter})."
- Skip to Step 8 (completion).
Display findings summary: "{N} findings from {agent_count} reviewers ({stack_count} stacks): {critical} critical, {high} high, {medium} medium" (for single-stack, omit the stacks part: "{N} findings from 4 reviewers: {critical} critical, {high} high, {medium} medium")
If more than 10 findings: present the list to user before proceeding: "The review found {N} findings (above typical range). Review the list in REVIEW.md and confirm you want to proceed with validation." Wait for user confirmation. If user declines, stop.

Step 5: STEP 2 -- VALIDATE EACH FINDING (spawn finding-validator agents)

For each finding in REVIEW.md (parsed from the ### F-NNN sections):
- Build validation context: finding ID, summary, severity, category, file path, line range, description, suggested fix, and source_agent (the specialist agent that originally produced the finding -- determined by category mapping: Bug/Security -> bug-detector, Pattern -> pattern-reviewer, Spec Gap -> plan-compliance-reviewer, Standards -> stack-reviewer)
- Spawn finding-validator agent via Task tool and the finding context. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model (inherit parent) -- finding validation is critical classification work
- Multiple validators CAN be spawned in parallel (they are read-only and independent)
- Batch up to 5 validators at a time to avoid overwhelming the system
Collect classifications from each validator's final message (the ## Classification section with Finding, Verdict, Confidence, Source Agent, and Reason fields)
Escalate MEDIUM confidence classifications to specialist agents for a second opinion:
- Filter the collected classifications: separate HIGH confidence (proceed unchanged) from MEDIUM confidence (need escalation)
- For each MEDIUM confidence classification, spawn a fresh finding-validator agent for a second opinion (NOT the source specialist — specialist agents have SubagentStop hooks that expect their standard output format, not the escalation format). Spawn via Task tool. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model. Provide this context packet:
```
You are providing a second opinion on a review finding that received an uncertain classification.

## Original Finding
- **ID:** F-{NNN}
- **Severity:** {severity}
- **Category:** {category}
- **File:** {file_path}
- **Lines:** {line_range}
- **Description:** {description}
- **Suggested Fix:** {suggested_fix}

## Validator Classification
- **Verdict:** {verdict}
- **Confidence:** MEDIUM
- **Reason:** {validator_reason}

## Your Task
Provide a second opinion on whether this finding is valid. Read the file and surrounding context. Respond with your verdict: REAL BUG or FALSE POSITIVE, followed by your reasoning.

End your response with your standard classification format:
## Classification
- **Finding:** F-{NNN}
- **Verdict:** {REAL BUG | FALSE POSITIVE}
- **Confidence:** HIGH
- **Source Agent:** {source_agent from original finding}
- **Reason:** {your reasoning for this second opinion}
```
- Specialist escalations are spawned SEQUENTIALLY (one at a time) -- each is a focused re-analysis
- After the finding-validator responds, parse the ## Classification section from its final message
- Use the specialist's verdict as the FINAL classification, overriding the validator's uncertain MEDIUM confidence classification
- If the specialist confirms REAL BUG: the finding stays with verdict REAL BUG
- If the specialist says FALSE POSITIVE: the finding's verdict becomes FALSE POSITIVE
- Record the escalation: append " (Escalated to {source_agent} -- reclassified as {verdict})" to the finding's Validation field in REVIEW.md (e.g., "FALSE POSITIVE (Escalated to bug-detector -- reclassified as FALSE POSITIVE)" or "REAL BUG (Escalated to pattern-reviewer -- reclassified as REAL BUG)")
- Display each escalation: "Escalated F-{NNN} to {source_agent} -- reclassified as {verdict}"
Read current REVIEW.md from disk (fresh read -- another validator batch may have been processed). Update REVIEW.md:
- Set each finding's Validation field to the final classification:
  - HIGH confidence findings: the validator's verdict (REAL BUG / FALSE POSITIVE / STYLISTIC)
  - Escalated MEDIUM confidence findings: the specialist's verdict with escalation note (e.g., "REAL BUG (Escalated to bug-detector -- reclassified as REAL BUG)")
- Update the Counts table with classification breakdown
Handle FALSE POSITIVE findings (including those reclassified by specialist escalation):
- If .bee/false-positives.md does not exist, create it with a # False Positives header
- Read .bee/false-positives.md, count the number of existing ## FP- headings, set the next FP number to count + 1
- For each FALSE POSITIVE finding, append an entry (incrementing the FP number for each):
```
## FP-{NNN}: {one-line summary}
- **Finding:** {original finding description from REVIEW.md}
- **Reason:** {validator's reason for FALSE POSITIVE classification}
- **Phase:** {phase number}
- **Date:** {current ISO 8601 date}
```
- For findings reclassified as FALSE POSITIVE via specialist escalation, include the specialist's reason (not the validator's) in the Reason field
- Update REVIEW.md: set the finding's Fix Status to "False Positive"
Handle STYLISTIC findings (user interaction):
- For each STYLISTIC finding, use AskUserQuestion: Question: "STYLISTIC finding: F-{NNN} -- '{summary}'. What to do?" Options: "Fix it" (add to confirmed fix list), "Ignore" (mark as Skipped in REVIEW.md), "False Positive" (persist to false-positives.md, won't be flagged again).
- Act on the user's choice for each STYLISTIC finding:
  - Fix it: add finding to the confirmed fix list
  - Ignore: mark as "Skipped (user ignored)" in REVIEW.md Fix Status
  - False Positive: append to .bee/false-positives.md (same format as step 5) and mark as "False Positive" in REVIEW.md
Build confirmed fix list: all REAL BUG findings (both HIGH confidence and specialist-confirmed) + user-approved STYLISTIC findings (those where user chose option a). Exclude any findings reclassified as FALSE POSITIVE by specialist escalation.
Display validation summary: "{real_bug} real bugs, {false_positive} false positives, {stylistic} stylistic ({user_fix} to fix, {user_ignore} ignored), {escalated} escalated ({escalated_real_bug} confirmed, {escalated_false_positive} reclassified as FP)"

Step 6: STEP 3 -- FIX CONFIRMED ISSUES (spawn fixer agents with file-based parallelism)

Sort confirmed findings by priority order:
- Priority 1: Critical severity
- Priority 2: High severity
- Priority 3: Standards category (Medium)
- Priority 4: Dead Code category (Medium)
- Priority 5: Other Medium severity
If no confirmed findings (all were false positives, ignored, or skipped): display "No confirmed findings to fix -- all findings were classified as false positives or stylistic (ignored)." Update STATE.md and skip to Step 8.

Fixer Parallelization Strategy:

Group confirmed findings by file path
For findings on DIFFERENT files: spawn fixers in parallel (one fixer per file group, processing its findings)
For findings on the SAME file: run fixers sequentially within the group (safety — each fix changes file state)
Collect all results, update review file with fix status

Example: 6 findings on 3 files → 3 parallel fixer groups (instead of 6 sequential).

For EACH file group (parallel across groups, sequential within each group):
- Build fixer context packet:
  - Finding details: ID, summary, severity, category, file path, line range, description, suggested fix
  - Validation classification: REAL BUG or STYLISTIC (user-approved)
  - Stack info: resolve the correct stack for the finding's file path using path-overlap logic (compare the finding's file path against each stack's path in config.stacks -- a file matches a stack if the file path starts with or is within the stack's path; "." matches everything). Pass the resolved stack name explicitly: "Stack: {resolved-stack-name}. Load the stack skill at skills/stacks/{resolved-stack-name}/SKILL.md." If only one stack is configured, use it directly.
- Spawn fixer agent via Task tool with the context packet. Use the parent model (omit model parameter) -- fixers write production code and need full reasoning.
- For findings on the same file: WAIT for each fixer to complete before spawning the next within that group. For findings on different files: fixer groups run in parallel.
- Read the fixer's fix report from its final message (## Fix Report section)
- Read current REVIEW.md from disk (fresh read -- Read-Modify-Write pattern)
- Update REVIEW.md: set Fix Status for this finding to the fixer's reported status (Fixed / Reverted / Failed)
- Write updated REVIEW.md to disk
- If fixer reports "Reverted" or "Failed" (tests broke and changes were reverted):
  - Display failure to user: "Fix for F-{NNN} failed -- tests broke after fix. Changes reverted. Skipping this finding."
  - Update REVIEW.md Fix Status to "Skipped (tests failed)"

After all confirmed findings have been processed, display fix summary: "{fixed} fixed, {skipped} skipped, {failed} failed out of {total} confirmed findings"

Step 7: STEP 4 -- RE-REVIEW (if loop mode enabled)

If loop mode is NOT enabled: skip to Step 8 (completion)
Track loop iterations separately from the cumulative iteration counter. Initialize $LOOP_ITERATION = 1 on first entry to Step 7 (do NOT re-initialize on subsequent loops). Increment $LOOP_ITERATION on each re-entry. Also increment the cumulative iteration_counter (used for STATE.md and REVIEW.md naming).
Display: "Starting re-review (loop iteration {$LOOP_ITERATION}, cumulative iteration {iteration_counter})..."

7.1: Archive current REVIEW.md

Before the re-review overwrites REVIEW.md, archive the current one:

Compute the previous iteration number: current iteration counter minus 1 (this is the iteration that produced the current REVIEW.md)
Rename {phase_directory}/REVIEW.md to {phase_directory}/REVIEW-{previous_iteration}.md
Display: "Archived previous review as REVIEW-{previous_iteration}.md"

7.2: Re-extract false positives

Re-run the Step 3.9 false-positive extraction. The .bee/false-positives.md file now includes any FPs documented during the previous iteration's validation step:

Read .bee/false-positives.md using the Read tool
If the file exists, build the updated formatted false-positives list (same format as Step 3.9)
If the file does not exist, set the false-positives list to: "No documented false positives."

7.3: Spawn review agents (same multi-stack logic as Step 4)

Per-stack agents (bug-detector, pattern-reviewer, stack-reviewer): one set per stack, same context packets as Step 4.1c with updated false-positives
Global agent (plan-compliance-reviewer): spawned ONCE, same context packet as Step 4.1d with updated false-positives

Spawn using the same economy/quality/premium mode logic as Step 4.2. Wait for all agents to complete.

7.4: Parse, deduplicate, and write new REVIEW.md

Apply the same consolidation and deduplication logic as Steps 4.3 through 4.5:

Parse findings from each agent's final message using the same category/severity mapping as Step 4.3
Deduplicate and merge overlapping findings using the same rules as Step 4.4 (same file + line ranges within 5 lines -> merge with higher severity, combined categories/descriptions)
Assign sequential IDs (F-001, F-002, ...) and write the new {phase_directory}/REVIEW.md using the review-report template, with the iteration number set to the current iteration counter in the Summary section

7.5: Evaluate re-review findings

Count total findings after consolidation
If 0 new findings after consolidation:
- Display: "Re-review clean -- no new findings after iteration {counter}."
- Skip to Step 8 (completion)
If new findings found:
- Display: "Re-review found {N} new findings. Validating and fixing..."
- Repeat from Step 5 (validate findings -> fix confirmed issues -> check for another loop iteration at Step 7)

Step 8: Completion

After all steps complete (or early exit from clean review):

Update STATE.md:
- Reviewed column: "Yes ({iteration_counter})" where iteration_counter is the cumulative review iteration count (first review = 1, first re-review = 2, etc.)
- Status: REVIEWED
- Last Action:
  - Command: /bee:review
  - Timestamp: current ISO 8601 timestamp
  - Result: "Phase {N} reviewed (iteration {iteration_counter}): {total_findings} findings, {confirmed} confirmed, {fixed} fixed, {false_positives} false positives"
Write updated STATE.md to disk
Display completion summary:

Phase {N} reviewed!

Phase: {phase-name}
Findings: {total} total
- Real bugs: {confirmed} ({fixed} fixed, {failed} failed)
- False positives: {fp_count} (documented in .bee/false-positives.md)
- Stylistic: {stylistic} ({user_fixed} fixed, {user_ignored} ignored)
Iterations: {iteration_count}

Use AskUserQuestion to let the user choose:

AskUserQuestion(
  question: "Review phase {N} complet. [X] findings: [F] fixed, [S] skipped, [FP] false positives.",
  options: ["Re-review", "Accept", "Testing", "Custom"]
)

Re-review: Re-run from Step 1. No iteration limit — user decides when clean.
Accept: End review, update STATE.md
Testing: Proceed to /bee:test
Custom: Free text

Design Notes (do not display to user):

The command auto-detects the phase to review (last EXECUTED or REVIEWED phase), or accepts an explicit --phase N argument to target a specific phase. Re-reviewing an already-reviewed phase is allowed -- the previous REVIEW.md is archived as REVIEW-{N}.md where N is the previous iteration number, and the iteration counter increments.
In multi-stack projects, bug-detector, pattern-reviewer, and stack-reviewer are spawned once per stack (3 per-stack agents) while plan-compliance-reviewer is spawned ONCE globally (stack-agnostic). Total: (3 x N) + 1 agents where N = number of stacks. For single-stack projects this is exactly 4 agents, identical to the original behavior. Model tier depends on implementation_mode: quality/premium mode omits model (inherits parent for deeper analysis); economy mode passes model: "sonnet" and spawns agents sequentially per stack to reduce token usage.
The command (not the agents) writes REVIEW.md. Agents report findings in their own output formats; the command normalizes, deduplicates, and writes the unified REVIEW.md.
Step 3.9 extracts false positives BEFORE spawning agents. Each agent receives the formatted false-positives list in its context packet so it can self-filter. The command does NOT need to post-filter.
The plan-compliance-reviewer operates in "code review mode" (not plan review mode). The context packet explicitly states this.
Deduplication merges findings from different agents when they reference the same file AND line ranges overlap within 5 lines. Higher severity is kept, categories and descriptions are combined.
REVIEW.md is the pipeline state, progressively updated as validation and fixing proceed. Analogous to TASKS.md checkboxes in execute-phase.
Finding validation can be parallel (up to 5 at a time). Fixing uses file-based parallelism: fixers for different files run in parallel; fixers for the same file run sequentially to prevent conflicts.
Specialist escalation for MEDIUM confidence findings happens AFTER batch validation completes (not inline during validation batching). Flow: (1) batch validate up to 5 findings, (2) collect all classifications, (3) for MEDIUM confidence ones, spawn the source specialist sequentially for a second opinion, (4) then proceed to update REVIEW.md with final classifications. Escalation uses bee:finding-validator (not the source specialist — specialist SubagentStop hooks expect their standard format, not second-opinion format). HIGH confidence classifications proceed unchanged -- only MEDIUM triggers escalation.
The command handles user interaction for STYLISTIC findings. Commands handle interaction, agents handle work.
.bee/false-positives.md is created on first use when the first false positive is documented. If no false positives exist yet, the file does not exist.
Loop mode is opt-in: --loop flag or config.review.loop. No hardcoded iteration cap — the user decides when clean via the interactive menu at Step 8. Re-review (Step 7) re-extracts false positives (Step 7.2), re-spawns all review agents in parallel (Step 7.3), and applies the same parse/deduplicate/consolidate pipeline (Step 7.4) before evaluating findings. The re-review agents see the updated code (post-fix) and updated false-positives list.
Always re-read STATE.md from disk before each update (Read-Modify-Write pattern) to ensure latest state.
The review agents, finding-validator, and fixer are spawned via Task tool as foreground subagents. The SubagentStop hook in hooks.json fires for implementer agents only (matcher: "implementer") -- it does NOT fire for review pipeline agents.
If the session ends mid-review (context limit, crash, user stops), re-running /bee:review detects the REVIEWING status and offers to resume. REVIEW.md on disk reflects the pipeline state at the time of interruption.
Token usage is approximately (3N + 1)x that of the previous single-reviewer approach due to per-stack parallel sessions (where N = number of stacks). For single-stack projects this is 4x. The tradeoff is more focused, higher-quality findings from domain specialists. Economy mode reduces peak token usage by serializing per-stack batches.
Per-stack agents (bug-detector, pattern-reviewer, stack-reviewer) support stack-specific variants. If plugins/bee/agents/stacks/{stack.name}/{role}.md exists, the stack-specific agent is used (e.g., laravel-inertia-vue-bug-detector); otherwise the generic bee:{role} agent is the fallback. This allows stacks to override review agents with domain-specific instructions while generic agents remain the default for stacks without dedicated agents.