Review current phase implementation against spec, standards, and quality checklist
From beenpx claudepluginhub bee-coded/bee-dev --plugin bee[--phase N] [--loop]/reviewRuns Codex code review on local git state (working tree or vs base branch). Supports --wait/--background, --base <ref>, --scope auto|working-tree|branch.
/reviewReviews staged changes or recent commits across five axes—correctness, readability, architecture, security, performance—producing categorized findings with file:line references and fixes.
/reviewReviews HTML file for design anti-patterns, principles violations, and accessibility issues. Generates markdown report with status tables and recommendations.
/reviewReviews specified code scope via four specialists (quality, security, performance, architecture), producing summary, detailed findings, refactoring suggestions, prioritized action plan.
/reviewPerforms expert multi-LLM code review with inline PR comments on staged changes, open PRs, working tree, or paths. Checks LLM providers and queries focus areas.
/reviewDispatches the reviewer agent to review current branch code changes against code quality principles.
Read these files using the Read tool:
.bee/STATE.md — if not found: NOT_INITIALIZED.bee/config.json — if not found: use {}You are running /bee:review -- the code review pipeline for BeeDev. This command orchestrates a four-step pipeline: review code, validate findings, fix confirmed issues, and optionally re-review. Follow these steps in order.
Check these guards in order. Stop immediately if any fails:
NOT_INITIALIZED guard: If the dynamic context above contains "NOT_INITIALIZED" (meaning .bee/STATE.md does not exist), tell the user:
"BeeDev is not initialized. Run /bee:init first."
Do NOT proceed.
NO_SPEC guard: Read STATE.md from the dynamic context above. If no Current Spec Path exists or it shows "(none)", tell the user:
"No spec found. Run /bee:new-spec first."
Do NOT proceed.
Phase detection: Check $ARGUMENTS for a --phase N flag. If present, use phase N explicitly. Validate: if phase N does not exist in the Phases table, tell the user: "Phase {N} does not exist. Your spec has {M} phases." Do NOT proceed. If the explicit phase's Status is not "EXECUTED" or "REVIEWED", tell the user: "Phase {N} has status {status} -- expected EXECUTED or REVIEWED for review." Do NOT proceed. If --phase N is not present, read the Phases table from STATE.md. Find the last phase where: Status is "EXECUTED" or "REVIEWED". This allows both first-time reviews and re-reviews of already-reviewed phases. If no such phase exists, tell the user:
"No executed phases waiting for review. Run /bee:execute-phase N first."
Do NOT proceed.
Already reviewing guard: If the Status column for the detected phase shows "REVIEWING", warn the user: "Phase {N} review is in progress. Continue from where it left off?" Wait for explicit confirmation before proceeding. If the user declines, stop.
{spec-path}/phases/{NN}-*/ where NN is the zero-padded phase number. This avoids slug construction mismatches.
{phase_directory}/TASKS.md{spec-path}/spec.md.bee/false-positives.md exists (Step 3.9 extracts false positives before review agents)$ARGUMENTS for --loop flagconfig.json from dynamic context for review.loop setting--loop in arguments OR config.review.loop is trueRead current .bee/STATE.md from disk (fresh read, not cached dynamic context).
3a. Archive previous REVIEW.md (re-review only):
If the detected phase has a Reviewed value of "Yes (N)" (i.e., it was previously reviewed):
{phase_directory}/REVIEW.md exists on disk{phase_directory}/REVIEW-{N}.md where N is the iteration number extracted from "Yes (N)" (e.g., "Yes (1)" -> archive as REVIEW-1.md, "Yes (2)" -> archive as REVIEW-2.md)If the phase has not been reviewed before (Reviewed column is empty), skip archival.
3b. Update STATE.md:
REVIEWING/bee:reviewDisplay to user: "Starting review of Phase {N}: {phase-name} (iteration {iteration_counter})..."
Build check (automatic, per-stack):
For each stack in config.stacks, scoped to its path:
package.json for a build script within {stack.path} (run node -e "const p=require('./{stack.path}/package.json'); process.exit(p.scripts?.build ? 0 : 1)" via Bash). Also check composer.json if the stack is Laravel-based.cd {stack.path} && npm run buildTest check (user opt-in, per-stack):
Ask the user: "Run tests before review? (yes/no)"
If the user says yes:
For each stack in config.stacks, resolve its test runner: read stacks[i].testRunner first, fall back to root config.testRunner if absent, then "none". Run each stack's test runner scoped to its path. Report per-stack: "Tests: {stack.name} ({runner}): {result}".
For each stack:
"none", display "Tests: {stack.name}: skipped (no test runner configured)" and continue to the next stack.vitest: cd {stack.path} && npx vitest run (parallel by default via worker threads)jest: cd {stack.path} && npx jest (parallel by default via workers, use --maxWorkers=auto if not set)pest: cd {stack.path} && ./vendor/bin/pest --parallel (uses Paratest under the hood)If the user says no: display "Tests: skipped" and continue.
Before spawning review agents, extract documented false positives so each agent can exclude known non-issues:
.bee/false-positives.md using the Read tool.## FP-NNN entry with its finding description, reason, and file reference. Format the list as:
EXCLUDE these documented false positives from your findings:
- FP-001: {summary} ({file}, {reason})
- FP-002: {summary} ({file}, {reason})
...
"No documented false positives."Context Cache (read once, pass to all agents):
Before spawning any agents, read these files once and include their content in every agent's context packet:
plugins/bee/skills/stacks/{stack}/SKILL.md.bee/CONTEXT.md.bee/false-positives.md.bee/user.mdPass these as part of the agent's prompt context — agents should NOT re-read these files themselves.
Dependency Scan:
Before spawning review agents, expand the file scope:
import/require/use statements to find its dependencies (files it imports)import/require any modified file to find its consumers (files that import it){name}.test.{ext}, {name}.spec.{ext}, tests/{name}.{ext}, __tests__/{name}.{ext}. Include discovered test file paths in the context packet.Spawn specialized review agents. In a multi-stack project, bug-detector, pattern-reviewer, and stack-reviewer are spawned once per stack (3 per-stack agents), while plan-compliance-reviewer is spawned ONCE globally (stack-agnostic). Total agents: (3 x N) + 1 where N = number of stacks. For single-stack projects, N = 1 so exactly 4 agents are spawned (identical to original behavior). The command (not the agents) writes REVIEW.md after consolidating all findings from all stacks.
4.1a: Read stacks from config
Read config.stacks from config.json. Build the stack list:
config.stacks exists and is an array: use it as-is. Each entry has name and path.config.stacks is absent but config.stack exists (legacy v2 config): create a single-entry list: [{ name: config.stack, path: "." }].Also read config.implementation_mode (defaults to "quality" if absent).
4.1b: Build shared context base
Build a shared context base for all agents:
{spec.md path}{TASKS.md path}{phase_directory}{N}4.1c: Build per-stack context packets
For each stack in the stacks list, build three agent-specific context packets. When the project has a single stack, this loop runs once and behavior is identical to the original four-agent approach.
Agent resolution (stack-specific fallback): For each per-stack agent, check if a stack-specific variant exists before using the generic agent. For each stack in the stacks list, resolve agents as follows:
plugins/bee/agents/stacks/{stack.name}/bug-detector.md exists. If yes, use {stack.name}-bug-detector as the agent name. If no, fallback to generic bee:bug-detector.plugins/bee/agents/stacks/{stack.name}/pattern-reviewer.md exists. If yes, use {stack.name}-pattern-reviewer as the agent name. If no, fallback to generic bee:pattern-reviewer.plugins/bee/agents/stacks/{stack.name}/stack-reviewer.md exists. If yes, use {stack.name}-stack-reviewer as the agent name. If no, fallback to generic bee:stack-reviewer.Generic agents remain the default for any stack that does not have dedicated stack-specific agents in plugins/bee/agents/stacks/{stack.name}/.
Per-stack Agent: Bug Detector (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack
You are reviewing Phase {N} implementation for bugs and security issues.
Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}
{false-positives list from Step 3.9}
Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. Review those files for bugs, logic errors, null handling issues, race conditions, edge cases, and security vulnerabilities (OWASP). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill for project-specific conventions). Report only HIGH confidence findings in your standard output format.
Per-stack Agent: Pattern Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack
You are reviewing Phase {N} implementation for pattern deviations.
Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
Stack: {stack.name}
{false-positives list from Step 3.9}
Read TASKS.md to find the files created/modified by this phase. Scope your file search to files within the `{stack.path}` directory. For each file, find 2-3 similar existing files in the codebase, extract their patterns, and compare. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report only HIGH confidence deviations in your standard output format.
Per-stack Agent: Stack Reviewer (resolved agent name -- see agent resolution above) -- model set in 4.2 by implementation_mode -- one per stack
You are reviewing Phase {N} implementation for stack best practice violations.
Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Phase directory: {phase_directory}
Phase number: {N}
{false-positives list from Step 3.9}
The stack for this review pass is `{stack.name}`. Load the stack skill at `skills/stacks/{stack.name}/SKILL.md` and check all code within the `{stack.path}` directory against that stack's conventions. If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides (CLAUDE.md takes precedence over stack skill). Use Context7 to verify framework best practices. Report only HIGH confidence violations in your standard output format.
4.1d: Build global context packet (spawned ONCE, not per-stack)
Before building the packet, check if {spec-path}/requirements.md exists on disk. Set the requirements line:
Requirements: {spec-path}/requirements.mdRequirements: (not found -- skip requirement tracking)Global Agent: Plan Compliance Reviewer (bee:plan-compliance-reviewer) -- model set in 4.2 by implementation_mode -- spawned ONCE globally
You are reviewing Phase {N} implementation in CODE REVIEW MODE (not plan review mode).
Spec: {spec.md path}
TASKS.md: {TASKS.md path}
Requirements: {spec-path}/requirements.md OR (not found -- skip requirement tracking)
Phase directory: {phase_directory}
Phase number: {N}
{false-positives list from Step 3.9}
Review mode: code review. Check implemented code against spec requirements and acceptance criteria. Verify every acceptance criterion in TASKS.md has corresponding implementation. Check for missing features, incorrect behavior, and over-scope additions. If phase > 1, also check cross-phase integration (imports, data contracts, workflow connections, shared state). If a project-level CLAUDE.md exists at the project root, read it for project-specific overrides. Report findings in your standard code review mode output format.
The total number of agents is (3 x N) + 1 where N is the number of stacks. For a single-stack project this is exactly 4.
Economy mode (implementation_mode: "economy"): Pass model: "sonnet" for all agents. Spawn agents sequentially per stack to reduce token usage:
model: "sonnet"). Wait for it to complete.model: "sonnet"). Wait for all three to complete before proceeding to the next stack.
In economy mode with a single stack, this results in the same 4 agents but spawned in two sequential batches instead of one parallel batch.Quality or Premium mode (default "quality", or "premium"): Spawn ALL agents (all per-stack agents + the global plan-compliance-reviewer) via Task tool calls in a SINGLE message (parallel execution). Omit the model parameter for all agents (they inherit the parent model) -- quality/premium mode uses the stronger model for deeper, more thorough review analysis. Wait for all agents to complete.
Wait for all agents to complete before proceeding.
After all agents complete, parse findings from each agent's final message. Each agent has a distinct output format -- normalize all findings into a unified list. Findings from all stacks are combined into a single consolidated list:
Bug Detector findings (from ## Bugs Detected section):
- **[Bug type]:** [Description] - \file:line`` entry becomes one findingPattern Reviewer findings (from ## Project Pattern Deviations section):
- **[Pattern type]:** [Deviation description] - \file:line`` entry becomes one findingPlan Compliance Reviewer findings (from ## Plan Compliance Findings section):
Stack Reviewer findings (from ## Stack Best Practice Violations section):
- **[Rule category]:** [Violation description] - \file:line`` entry becomes one findingIf an agent reports no findings (e.g., "No bugs detected.", "No project pattern deviations found.", etc.), it contributes zero findings.
For each pair of findings from different agents, check if they reference the same file AND their line ranges overlap (within 5 lines of each other). If so, merge them:
{phase_directory}/REVIEW.md using the review-report template (skills/core/templates/review-report.md):
### F-NNN section with: Severity, Category, File, Lines, Description, Suggested Fix, Validation: pending, Fix Status: pendingCount total findings, count by severity (critical, high, medium), count by category.
If 0 findings after consolidation:
Display findings summary: "{N} findings from {agent_count} reviewers ({stack_count} stacks): {critical} critical, {high} high, {medium} medium" (for single-stack, omit the stacks part: "{N} findings from 4 reviewers: {critical} critical, {high} high, {medium} medium")
If more than 10 findings: present the list to user before proceeding: "The review found {N} findings (above typical range). Review the list in REVIEW.md and confirm you want to proceed with validation." Wait for user confirmation. If user declines, stop.
For each finding in REVIEW.md (parsed from the ### F-NNN sections):
source_agent (the specialist agent that originally produced the finding -- determined by category mapping: Bug/Security -> bug-detector, Pattern -> pattern-reviewer, Spec Gap -> plan-compliance-reviewer, Standards -> stack-reviewer)finding-validator agent via Task tool and the finding context. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model (inherit parent) -- finding validation is critical classification workCollect classifications from each validator's final message (the ## Classification section with Finding, Verdict, Confidence, Source Agent, and Reason fields)
Escalate MEDIUM confidence classifications to specialist agents for a second opinion:
finding-validator agent for a second opinion (NOT the source specialist — specialist agents have SubagentStop hooks that expect their standard output format, not the escalation format). Spawn via Task tool. Model selection: economy mode passes model: "sonnet", quality or premium mode omits model. Provide this context packet:
You are providing a second opinion on a review finding that received an uncertain classification.
## Original Finding
- **ID:** F-{NNN}
- **Severity:** {severity}
- **Category:** {category}
- **File:** {file_path}
- **Lines:** {line_range}
- **Description:** {description}
- **Suggested Fix:** {suggested_fix}
## Validator Classification
- **Verdict:** {verdict}
- **Confidence:** MEDIUM
- **Reason:** {validator_reason}
## Your Task
Provide a second opinion on whether this finding is valid. Read the file and surrounding context. Respond with your verdict: REAL BUG or FALSE POSITIVE, followed by your reasoning.
End your response with your standard classification format:
## Classification
- **Finding:** F-{NNN}
- **Verdict:** {REAL BUG | FALSE POSITIVE}
- **Confidence:** HIGH
- **Source Agent:** {source_agent from original finding}
- **Reason:** {your reasoning for this second opinion}
## Classification section from its final messageRead current REVIEW.md from disk (fresh read -- another validator batch may have been processed). Update REVIEW.md:
Handle FALSE POSITIVE findings (including those reclassified by specialist escalation):
.bee/false-positives.md does not exist, create it with a # False Positives header.bee/false-positives.md, count the number of existing ## FP- headings, set the next FP number to count + 1## FP-{NNN}: {one-line summary}
- **Finding:** {original finding description from REVIEW.md}
- **Reason:** {validator's reason for FALSE POSITIVE classification}
- **Phase:** {phase number}
- **Date:** {current ISO 8601 date}
Handle STYLISTIC findings (user interaction):
.bee/false-positives.md (same format as step 5) and mark as "False Positive" in REVIEW.mdBuild confirmed fix list: all REAL BUG findings (both HIGH confidence and specialist-confirmed) + user-approved STYLISTIC findings (those where user chose option a). Exclude any findings reclassified as FALSE POSITIVE by specialist escalation.
Display validation summary: "{real_bug} real bugs, {false_positive} false positives, {stylistic} stylistic ({user_fix} to fix, {user_ignore} ignored), {escalated} escalated ({escalated_real_bug} confirmed, {escalated_false_positive} reclassified as FP)"
Fixer Parallelization Strategy:
Example: 6 findings on 3 files → 3 parallel fixer groups (instead of 6 sequential).
path in config.stacks -- a file matches a stack if the file path starts with or is within the stack's path; "." matches everything). Pass the resolved stack name explicitly: "Stack: {resolved-stack-name}. Load the stack skill at skills/stacks/{resolved-stack-name}/SKILL.md." If only one stack is configured, use it directly.fixer agent via Task tool with the context packet. Use the parent model (omit model parameter) -- fixers write production code and need full reasoning.CRITICAL: Within the same file group, spawn fixers SEQUENTIALLY, one at a time. Never spawn multiple fixers for the same file in parallel. One fix may change the context for the next finding on that file. Cross-file fixer groups may run in parallel safely.
$LOOP_ITERATION = 1 on first entry to Step 7 (do NOT re-initialize on subsequent loops). Increment $LOOP_ITERATION on each re-entry. Also increment the cumulative iteration_counter (used for STATE.md and REVIEW.md naming).Before the re-review overwrites REVIEW.md, archive the current one:
{phase_directory}/REVIEW.md to {phase_directory}/REVIEW-{previous_iteration}.mdRe-run the Step 3.9 false-positive extraction. The .bee/false-positives.md file now includes any FPs documented during the previous iteration's validation step:
.bee/false-positives.md using the Read tool"No documented false positives."Apply the same multi-stack spawning logic as Step 4. Rebuild context packets using Step 4.1 (same stacks list, same per-stack and global agent structure) but with the refreshed false-positives list from Step 7.2. The agents review the updated code (including all fixes applied in previous iterations).
Spawn using the same economy/quality/premium mode logic as Step 4.2. Wait for all agents to complete.
Apply the same consolidation and deduplication logic as Steps 4.3 through 4.5:
{phase_directory}/REVIEW.md using the review-report template, with the iteration number set to the current iteration counter in the Summary sectionAfter all steps complete (or early exit from clean review):
REVIEWED/bee:reviewPhase {N} reviewed!
Phase: {phase-name}
Findings: {total} total
- Real bugs: {confirmed} ({fixed} fixed, {failed} failed)
- False positives: {fp_count} (documented in .bee/false-positives.md)
- Stylistic: {stylistic} ({user_fixed} fixed, {user_ignored} ignored)
Iterations: {iteration_count}
Use AskUserQuestion to let the user choose:
AskUserQuestion(
question: "Review phase {N} complet. [X] findings: [F] fixed, [S] skipped, [FP] false positives.",
options: ["Re-review", "Accept", "Testing", "Custom"]
)
/bee:testDesign Notes (do not display to user):
--phase N argument to target a specific phase. Re-reviewing an already-reviewed phase is allowed -- the previous REVIEW.md is archived as REVIEW-{N}.md where N is the previous iteration number, and the iteration counter increments.(3 x N) + 1 agents where N = number of stacks. For single-stack projects this is exactly 4 agents, identical to the original behavior. Model tier depends on implementation_mode: quality/premium mode omits model (inherits parent for deeper analysis); economy mode passes model: "sonnet" and spawns agents sequentially per stack to reduce token usage.bee:finding-validator (not the source specialist — specialist SubagentStop hooks expect their standard format, not second-opinion format). HIGH confidence classifications proceed unchanged -- only MEDIUM triggers escalation..bee/false-positives.md is created on first use when the first false positive is documented. If no false positives exist yet, the file does not exist.--loop flag or config.review.loop. No hardcoded iteration cap — the user decides when clean via the interactive menu at Step 8. Re-review (Step 7) re-extracts false positives (Step 7.2), re-spawns all review agents in parallel (Step 7.3), and applies the same parse/deduplicate/consolidate pipeline (Step 7.4) before evaluating findings. The re-review agents see the updated code (post-fix) and updated false-positives list./bee:review detects the REVIEWING status and offers to resume. REVIEW.md on disk reflects the pipeline state at the time of interruption.(3N + 1)x that of the previous single-reviewer approach due to per-stack parallel sessions (where N = number of stacks). For single-stack projects this is 4x. The tradeoff is more focused, higher-quality findings from domain specialists. Economy mode reduces peak token usage by serializing per-stack batches.plugins/bee/agents/stacks/{stack.name}/{role}.md exists, the stack-specific agent is used (e.g., laravel-inertia-vue-bug-detector); otherwise the generic bee:{role} agent is the fallback. This allows stacks to override review agents with domain-specific instructions while generic agents remain the default for stacks without dedicated agents.