Verifies code implementations match specs, PRDs, epics, or tasks by checking completeness, acceptance criteria, edge cases, and scope creep. Use post- or during-implementation.
npx claudepluginhub xexr/marketplace --plugin review-implementationThis skill uses the workspace's default tool permissions.
Multi-LLM implementation review for verifying code matches spec, identifying gaps, and tracking completion state.
Reviews generated/updated documentation (MD files, beads issues, specs, task lists) with multi-LLM parallel analysis for inconsistencies, codebase mismatches, gaps. Interactive synthesis and fixes.
Conducts single-pass code reviews enforcing test evidence gates (e.g., passing E2E tests), spec compliance, quality checks, and confidence-based issue reporting.
Dispatches parallel specialist agents to review code diffs, PRDs, or plans before PRs or as quality gates. Invoked via /review, code-review agent, or team-dev in swarm tiers.
Share bugs, ideas, or general feedback.
Multi-LLM implementation review for verifying code matches spec, identifying gaps, and tracking completion state.
When this skill is invoked, check IMMEDIATELY if args were provided.
When invoked via Skill(skill="review-implementation", args="..."), the args string appears in your context. Look for any of these patterns:
--models opus,gpt,gemini --categories all --scope-spec "epic cgt-22" --scope-impl "changes since start"args parameterIf args are detected, parse them BEFORE doing anything else:
# Pseudo-code for arg parsing
args_string = [the args string you detected]
# Extract each parameter
models = extract_after("--models", args_string) # e.g., "opus,gpt,gemini" -> ["opus", "gpt", "gemini"]
categories = extract_after("--categories", args_string) # e.g., "all" or "completeness,quality"
scope_spec = extract_quoted_after("--scope-spec", args_string) # e.g., "epic cgt-22"
scope_impl = extract_quoted_after("--scope-impl", args_string) # e.g., "changes since start"
run_preflight = contains("--run-preflight-checks", args_string) # boolean
skip_preflight = contains("--skip-preflight-checks", args_string) # boolean
If args ARE present (preflight mode):
--scope-spec and --scope-impl--models and --categories--run-preflight-checks is presentIf args are NOT present (interactive mode):
After parsing, confirm your understanding:
📋 Preflight Mode Detected
- Models: [list parsed models]
- Categories: [list parsed categories]
- Spec scope: [parsed scope-spec]
- Impl scope: [parsed scope-impl]
- Pre-flight checks: [skip/run]
Proceeding directly to Step 2...
CRITICAL: If you're unsure whether args were provided, default to INTERACTIVE mode and ask the user.
Sequential with review-documentation: Review docs first (ensure spec is accurate), then review implementation (ensure code matches spec).
When invoked with args, skip interactive questions and proceed directly to review.
/review-implementation --models <models> --categories <categories> --scope-spec "<spec>" --scope-impl "<impl>" [--run-preflight-checks]
Parameters:
--models: Comma-separated list: opus, gpt, gemini (default: opus)--categories: all or comma-separated: completeness,quality,scope,standards (default: all)--scope-spec: Spec scope description (e.g., "epic cgt-22", "specs/auth.md")--scope-impl: Implementation scope description (e.g., "changes in this branch", "src/auth/")--run-preflight-checks: Optional. Run CLI pre-flight checks for GPT/Gemini before dispatch--skip-preflight-checks: Optional. Explicitly skip CLI pre-flight checks (default behavior, but explicit for clarity)# Full automation - Opus only, all categories
/review-implementation --models opus --categories all --scope-spec "epic cgt-22" --scope-impl "changes since epic start"
# Multi-model with specific categories
/review-implementation --models opus,gpt --categories completeness,quality --scope-spec "specs/auth.md" --scope-impl "src/auth/"
# With pre-flight checks enabled
/review-implementation --models opus,gemini --categories all --scope-spec "epic cgt-15" --scope-impl "this branch" --run-preflight-checks
When args are provided:
--scope-spec and --scope-impl instead of asking--models and --categories instead of asking--categories all or explicit list--run-preflight-checks flag presentUse case: Automation from other skills (e.g., implementing-beads-epic) that already know the review parameters.
flowchart TD
start([User invokes skill])
check_args{Args provided?}
parse_args[Parse preflight args]
ask_llms[Step 1a: LLMs + Scope description]
show_cats[Step 1b: Show categories + Review All vs Customize]
fast_track{Review All?}
ask_cats[Step 1c: 4 multiselect questions - review categories]
preflight_checks[Pre-flight checks IN PARALLEL]
check_gpt[Check Codex CLI available + logged in]
fail_gpt([Abort: tell user to fix Codex setup])
check_gemini[Check Gemini CLI available + logged in]
fail_gemini([Abort: tell user to fix Gemini setup])
haiku[Step 2: Haiku gathers spec paths + implementation paths]
build_brief[Build dynamic review brief]
dispatch[Step 3: Dispatch reviewers IN PARALLEL]
opus[Opus 4.6 sub-agent]
gpt[Codex CLI GPT sub-agent]
gemini[Gemini CLI sub-agent]
collect[Collect results via TaskOutput]
synthesize[Step 4: Compare, deduplicate, synthesize]
findings([Present findings with completion matrix])
gate[Step 5.1: Gate - wait for 'go']
show_findings[Step 5.2: Summarize + Address All vs Customize]
all_findings{Address All?}
select_findings[Step 5.2b: Select specific findings]
resolve_ambig[Step 5.3: Resolve ambiguities one-by-one]
select_actions[Step 5.4: Select actions]
execute[Step 5.5: Execute actions]
summary[Step 5.6: Summary of actions taken]
followup{Step 5.7: Issues created?}
ask_impl[Ask: Implement now?]
implement[Implement created issues]
done([Done])
start --> check_args
check_args -->|yes - preflight mode| parse_args
check_args -->|no - interactive mode| ask_llms
parse_args --> haiku
ask_llms --> show_cats
show_cats --> fast_track
fast_track -->|yes| preflight_checks
fast_track -->|no| ask_cats
ask_cats --> preflight_checks
preflight_checks -->|if GPT selected| check_gpt
preflight_checks -->|if Gemini selected| check_gemini
preflight_checks -->|neither selected| haiku
check_gpt -->|ok| haiku
check_gpt -->|fail| fail_gpt
check_gemini -->|ok| haiku
check_gemini -->|fail| fail_gemini
haiku --> build_brief
build_brief --> dispatch
dispatch -->|if selected| opus
dispatch -->|if selected| gpt
dispatch -->|if selected| gemini
opus --> collect
gpt --> collect
gemini --> collect
collect --> synthesize
synthesize --> findings
findings --> gate
gate --> show_findings
show_findings --> all_findings
all_findings -->|yes| resolve_ambig
all_findings -->|no| select_findings
select_findings --> resolve_ambig
resolve_ambig --> select_actions
select_actions --> execute
execute --> summary
summary --> followup
followup -->|yes| ask_impl
followup -->|no| done
ask_impl -->|yes| implement
ask_impl -->|no| done
implement --> done
Run pre-flight checks IN PARALLEL for all selected external CLI agents.
If the user selects GPT 5.4, verify Codex CLI is available before proceeding:
# Check Codex CLI is installed
command -v codex >/dev/null 2>&1 || { echo "Codex CLI not installed"; exit 1; }
# Test the actual model
codex exec -m "gpt-5.4" -c reasoning_effort="high" --sandbox read-only "Respond with only: READY" 2>&1
If check fails, STOP and tell the user:
"Codex CLI check failed. Select Opus 4.6 only, or fix your Codex setup first."
If the user selects Gemini 3 Pro, verify Gemini CLI is available before proceeding:
# Check Gemini CLI is installed
command -v gemini >/dev/null 2>&1 || { echo "Gemini CLI not installed"; exit 1; }
# Test the actual model
gemini "Respond with only: READY" --model gemini-3-pro-preview -y 2>&1
If check fails, STOP and tell the user:
"Gemini CLI check failed. Select Opus 4.6 only, or fix your Gemini setup first."
PREFLIGHT MODE CHECK: If args were provided (see "Preflight Mode" section above), skip this entire step. Use the provided --scope-spec, --scope-impl, --models, and --categories values and proceed directly to Step 2.
CRITICAL: Do NOT explore the codebase yourself. Just collect scope from user.
Ask the user for scope (free text):
"What should I review?
- Spec scope - what spec/requirements to review against? (e.g., "epic cgt-22", "specs/auth.md", "tasks 1-5")
- Implementation scope - what code to review? (e.g., "changes in this branch", "src/auth/", "all files since epic start")"
Batch all configuration into ONE AskUserQuestion call with 3 questions. Users can tab through efficiently:
questions: [
{
question: "Which models should perform this review?",
header: "Models",
multiSelect: true,
options: [
{ label: "Opus 4.6 (Recommended)", description: "Claude Opus 4.6 - strong reasoning, nuanced analysis" },
{ label: "GPT 5.4", description: "OpenAI's latest via Codex CLI - different perspective" },
{ label: "Gemini 3 Pro", description: "Google's latest via Gemini CLI - third perspective" }
]
},
{
question: "Review mode?",
header: "Categories",
multiSelect: false,
options: [
{ label: "Review All (Recommended)", description: "Check all categories: completeness, quality, scope, standards" },
{ label: "Customize", description: "Select specific categories to review" }
]
},
{
question: "Run CLI pre-flight checks for GPT/Gemini?",
header: "Pre-flight",
multiSelect: false,
options: [
{ label: "Skip (Recommended)", description: "Assume CLIs work - faster startup" },
{ label: "Run checks", description: "Verify Codex/Gemini CLIs are working before dispatch" }
]
}
]
This reduces LLM round-trips from multiple separate questions to 1 batched interaction.
Single Model Warning: If user selects only 1 model, show a follow-up warning:
question: "Only 1 model selected. Multi-model comparison provides better coverage. Proceed?"
header: "Single Model"
multiSelect: false
options:
- label: "Add another model (Recommended)"
description: "Go back and select additional models for comparison"
- label: "Continue with 1 model"
description: "Get review from single model (no comparison)"
Review Mode Handling:
"Review All": Skip to Step 2 with defaults:
"Customize": Show the 4 multiselect questions below.
Pre-flight Handling:
Question 1: Completeness
question: "Which completeness checks should agents perform?"
header: "Completeness"
multiSelect: true
options:
- label: "Feature Completeness (Recommended)"
description: "Are all specified features/requirements implemented?"
- label: "Acceptance Criteria Met (Recommended)"
description: "Do implementations satisfy stated acceptance criteria?"
- label: "Task Status Accuracy"
description: "Are beads/task statuses accurate (closed = actually complete)?"
Question 2: Quality
question: "Which quality checks should agents perform?"
header: "Quality"
multiSelect: true
options:
- label: "Edge Cases Covered (Recommended)"
description: "Were documented edge cases handled in implementation?"
- label: "Error Handling"
description: "Are specified error scenarios handled correctly?"
- label: "Test Coverage"
description: "Are there tests for the implemented features?"
Question 3: Scope
question: "Which scope checks should agents perform?"
header: "Scope"
multiSelect: true
options:
- label: "No Scope Creep (Recommended)"
description: "Was ONLY what was specified implemented (no extras)?"
- label: "Approach Alignment"
description: "Does implementation follow specified approach (note deviations)?"
- label: "No Missing Dependencies"
description: "Were all prerequisite tasks completed before dependent ones?"
Question 4: Standards
question: "Which standards checks should agents perform?"
header: "Standards"
multiSelect: true
options:
- label: "Project Conventions"
description: "Does implementation follow CLAUDE.md and project standards?"
- label: "API Contract Match"
description: "Do implemented APIs match specified contracts/interfaces?"
Time budget for Step 1: Under 60 seconds total.
Dispatch a Haiku subagent to discover relevant paths.
CRITICAL: If scope description matches multiple epics (e.g., multiple "Phase 6" epics), Haiku MUST:
DISAMBIGUATION_NEEDED flag if multiple matchesIf disambiguation is needed, present candidates to user:
question: "Multiple matching epics found. Which one should I review?"
header: "Epic"
multiSelect: false
options:
[Dynamically populate with epic ID + title for each match]
Task(
subagent_type="Explore",
model="haiku",
prompt="Find all relevant paths for implementation review.
SPEC SCOPE: [USER'S SPEC DESCRIPTION]
IMPLEMENTATION SCOPE: [USER'S IMPLEMENTATION DESCRIPTION]
Target codebase: [PATH]
Return ONLY structured lists of paths. Do NOT read file contents.
## DISAMBIGUATION CHECK (CRITICAL)
Before returning paths, check if multiple epics/specs match the scope description.
- Run `bd list` and search for matching issues
- If the scope mentions 'Phase X' or similar, check ALL epics that mention that phase
- If multiple matches exist, return DISAMBIGUATION_NEEDED: true with list of candidates
Look for SPEC paths in:
- .beads/ directory for matching issue IDs (use bd list to find relevant IDs)
- specs/ or docs/ directories for matching markdown files
- Any paths explicitly mentioned in scope description
Look for IMPLEMENTATION paths in:
- Source directories matching implementation scope
- Test files for implemented features
- Changed files (use git diff if branch specified)
Output format:
DISAMBIGUATION_NEEDED: true/false
CANDIDATE_EPICS: (if disambiguation needed)
- cgt-XX: [Epic Title 1]
- cgt-YY: [Epic Title 2]
SPEC_BEADS_IDS: cgt-22, cgt-11, ...
SPEC_FILE_PATHS: /path/to/spec.md, ...
IMPL_FILE_PATHS: /path/to/impl.ts, ...
TEST_FILE_PATHS: /path/to/impl.test.ts, ...
Be thorough but fast. Paths only, no content."
)
Always pre-read beads issues to include in both Opus and GPT prompts.
This speeds up the review by providing spec context upfront, reducing tool calls during review.
# For each issue ID from Haiku:
bd show cgt-XX
bd show cgt-YY
# ... etc
Store the output and include it verbatim in both sub-agent prompts under ## Beads Issue Contents (Pre-Read).
Note: GPT can still explore additional issues (dependencies, linked issues) using bd commands if needed during review.
CRITICAL: All agents MUST use run_in_background: true for true parallelism.
Special Note on Gemini: Gemini requires a TWO-STEP approach to avoid JSON serialization issues:
See "Dispatching Gemini Sub-Agent" section below for details.
# Dispatch sequence:
# Step 1: Write Gemini prompt to temp file (sync, fast)
Bash(command="cat > /tmp/gemini_prompt.txt <<'EOF' ... EOF")
# Step 2: In ONE message, dispatch ALL reviewers with run_in_background: true:
Task(
subagent_type="general-purpose",
model="opus",
run_in_background=true, # <-- REQUIRED
prompt="..."
)
Bash(
command="codex exec ...",
run_in_background=true, # <-- REQUIRED
timeout=900000
)
Bash(
command="gemini \"$(cat /tmp/gemini_prompt.txt)\" ...",
run_in_background=true, # <-- REQUIRED
timeout=900000
)
If you forget run_in_background: true on any call, they will NOT run in parallel.
Build based on user selections from Step 1b.
Template:
You are reviewing IMPLEMENTATION against a SPECIFICATION.
Your job is to verify the code matches what was specified, identify gaps, and assess completion state.
## IMPORTANT: Review Only - No Changes
**DO NOT modify any files during this review phase.**
This is a READ-ONLY review for context gathering and gap identification. Your role is to:
- Read and analyze code
- Compare implementation against spec
- Identify gaps and issues
- Report findings
Any actual code changes will be made in a later phase after findings are reviewed.
Do NOT use Edit, Write, or any file modification tools.
## Review Mode: Approach-Aware
- If the outcome matches the spec, implementation is correct even if approach differs
- Note approach deviations as INFORMATIONAL, not failures
- Only flag as issues if the OUTCOME doesn't match spec
## Verification Requirement
**CRITICAL: Do NOT make assumptions about how the codebase works. VERIFY by reading code.**
Before stating how something works (e.g., "the CLI is workspace-agnostic"):
1. Read the actual code that implements it
2. Quote specific lines/functions that prove your assertion
3. If you cannot verify by reading code, mark as UNVERIFIED and flag for human review
Common assumption mistakes to avoid:
- Assuming a component hasn't changed since last review
- Inferring behavior from naming conventions without reading implementation
- Stating capabilities without checking the actual code
## Later-Phase Awareness
**CRITICAL: Before recommending improvements, check if they're already scoped for later.**
When reviewing against an issue that's part of a larger epic or brief:
1. Examine the full epic/parent issue to see all planned phases
2. Check sibling issues for work already scoped in later phases
3. **Do NOT recommend bringing forward work that's already properly scoped for a later phase**
If an improvement IS already scoped later, note it as:
> "Future-scoped: [description] is covered by [issue-id] in a later phase"
Only recommend additions/improvements if they:
- Are genuinely missing from all phases
- Are blocking current phase completion
- Represent a gap in the overall plan
## DO CHECK - Review these categories:
[Include ONLY categories user selected]
### Feature Completeness (if selected)
- Is each specified feature/requirement implemented?
- Create a checklist: requirement → implemented? → where?
### Acceptance Criteria Met (if selected)
- For each acceptance criterion, is it satisfied?
- Quote the criterion and explain how implementation meets it (or doesn't)
### Task Status Accuracy (if selected)
- Are closed tasks actually complete?
- Are open tasks actually incomplete?
- Flag any status mismatches
### Edge Cases Covered (if selected)
- For each documented edge case, is it handled?
- Where in the code is it handled?
### Error Handling (if selected)
- Are specified error scenarios handled?
- Do error messages match spec?
### Test Coverage (if selected)
- Are there tests for implemented features?
- Do tests cover specified scenarios?
### No Scope Creep (if selected)
- Is there code that wasn't specified?
- Flag additions that weren't in spec (INFORMATIONAL unless problematic)
### Approach Alignment (if selected)
- Does implementation follow specified approach?
- Note deviations as INFORMATIONAL with brief explanation
### No Missing Dependencies (if selected)
- Were prerequisite tasks completed?
- Are there incomplete dependencies?
### Project Conventions (if selected)
- Does code follow CLAUDE.md conventions?
- Naming, structure, patterns match project style?
### API Contract Match (if selected)
- Do implemented APIs match specified interfaces?
- Request/response shapes, error codes, etc.
## DO NOT CHECK - Skip these categories:
[List categories user did NOT select]
## Output Format
### Completion Matrix
| Requirement | Status | Location | Notes |
|-------------|--------|----------|-------|
| [From spec] | DONE/PARTIAL/MISSING/DEVIATED | file:line | [Details] |
### Issues (Risk-Weighted)
For each issue:
- **[SEVERITY] Issue Title**
- Requirement: What was specified
- Actual: What was implemented (or not)
- Location: File and line
- Recommendation: How to resolve
### Summary Stats
- Requirements: X total, Y complete, Z partial, W missing
- Scope creep items: N (if checking)
- Approach deviations: N (if checking)
Task(
subagent_type="general-purpose",
model="opus",
run_in_background=true,
prompt="[Sub-agent brief above]
SPECIFICATION to review against:
- Spec files: [SPEC_FILE_PATHS from Haiku]
## Beads Issue Contents (Pre-Read)
[PASTE VERBATIM OUTPUT FROM `bd show` FOR EACH ISSUE HERE]
IMPLEMENTATION to review:
- Implementation files: [IMPL_FILE_PATHS from Haiku]
- Test files: [TEST_FILE_PATHS from Haiku]
Target codebase: /path/to/project
## How to explore additional beads issues
The main spec issues are provided above. If you need to explore dependencies or linked issues:
- bd show <id> - View issue details
- bd dep show <id> - View issue dependencies
- bd list --status=open - List open issues
Use Read tool for all source files.
Compare spec requirements against actual implementation.
REMINDER: This is a READ-ONLY review. Do NOT modify any files - no Edit or Write tool calls."
)
Pre-read beads issues are included in the prompt for faster review. GPT can explore additional issues if needed.
# Bash tool call with run_in_background: true, timeout: 900000 (15 minutes)
codex exec -m "gpt-5.4" -c reasoning_effort="high" --sandbox workspace-write "$(cat <<'PROMPT'
[Sub-agent brief above]
SPECIFICATION to review against:
- Spec files: [SPEC_FILE_PATHS]
## Beads Issue Contents (Pre-Read)
[PASTE VERBATIM OUTPUT FROM `bd show` FOR EACH ISSUE HERE]
IMPLEMENTATION to review:
- Implementation files: [IMPL_FILE_PATHS]
- Test files: [TEST_FILE_PATHS]
## How to explore additional beads issues
The main spec issues are provided above. If you need to explore dependencies or linked issues:
- bd show <id> - View issue details
- bd dep show <id> - View issue dependencies
- bd list --status=open - List open issues
## Shell Command Best Practices
**CRITICAL: Path quoting**
Always wrap file paths containing special characters in single quotes:
- Parentheses: `(dashboard)`
- Brackets: `[slug]`
- Spaces
Example:
```bash
# WRONG - will fail
sed -n '1,100p' apps/web/app/(dashboard)/workspace/[slug]/page.tsx
# CORRECT
sed -n '1,100p' 'apps/web/app/(dashboard)/workspace/[slug]/page.tsx'
Avoid login shell flag When running shell commands, prefer non-login shell:
/bin/zsh -c 'command' # Preferred - non-login shell
Avoid login shell (-l flag) as it may trigger profile errors in sandbox environments.
This review may take 10-15 minutes. Take your time to:
Don't rush - thoroughness is more important than speed.
REMINDER: This is a READ-ONLY review. Do NOT modify any files. PROMPT )"
### Dispatching Gemini Sub-Agent via Gemini CLI
**CRITICAL: Use the TWO-STEP temp file approach to avoid JSON serialization issues with complex prompts.**
**WHY:** When the executing LLM constructs a Bash tool call containing a long HEREDOC with markdown, code blocks, and special characters, the JSON serialization can break (resulting in "Invalid tool parameters"). Using a temp file separates the complex prompt content from the shell command.
**Pre-read beads issues are included in the prompt for faster review. Gemini can explore additional issues if needed.**
**IMPORTANT: Gemini runs without sandbox (docker/podman not required). The prompt explicitly instructs Gemini NOT to modify files during review phase.**
**Step 1: Write prompt to temp file (first Bash call)**
```bash
# Bash tool call parameters:
# command: "cat > /tmp/gemini_review_prompt.txt <<'GEMINI_PROMPT' ... "
# run_in_background: false <-- This is fast, no need for background
# timeout: 30000 <-- 30 seconds is plenty
cat > /tmp/gemini_review_prompt.txt <<'GEMINI_PROMPT'
[Sub-agent brief above]
SPECIFICATION to review against:
- Spec files: [SPEC_FILE_PATHS]
## Beads Issue Contents (Pre-Read)
[PASTE VERBATIM OUTPUT FROM `bd show` FOR EACH ISSUE HERE]
IMPLEMENTATION to review:
- Implementation files: [IMPL_FILE_PATHS]
- Test files: [TEST_FILE_PATHS]
## How to explore additional beads issues
The main spec issues are provided above. If you need to explore dependencies or linked issues:
- bd show <id> - View issue details
- bd dep show <id> - View issue dependencies
- bd list --status=open - List open issues
## Shell Command Best Practices
**CRITICAL: Path quoting**
Always wrap file paths containing special characters in single quotes:
- Parentheses: `(dashboard)`
- Brackets: `[slug]`
- Spaces
Example:
```bash
# WRONG - will fail
cat apps/web/app/(dashboard)/workspace/[slug]/page.tsx
# CORRECT
cat 'apps/web/app/(dashboard)/workspace/[slug]/page.tsx'
This review may take 10-15 minutes. Take your time to:
Don't rush - thoroughness is more important than speed.
REMINDER: This is a READ-ONLY review. Do NOT modify any files. GEMINI_PROMPT
**Step 2: Call Gemini reading from temp file (second Bash call, IN PARALLEL with Opus/GPT)**
```bash
# Bash tool call parameters:
# command: "gemini \"$(cat /tmp/gemini_review_prompt.txt)\" ..."
# run_in_background: true <-- REQUIRED FOR PARALLEL EXECUTION
# timeout: 900000 <-- 15 minutes
gemini "$(cat /tmp/gemini_review_prompt.txt)" --model gemini-3-pro-preview -y && rm /tmp/gemini_review_prompt.txt
Notes on Gemini CLI:
gemini-3-pro-previewtimeout: 900000 ms)run_in_background: true on the Bash call - THIS IS CRITICAL&& rm cleanup runs after Gemini completesCRITICAL: Call ALL TaskOutput calls IN PARALLEL in ONE message.
If you call them sequentially, the main thread blocks on each one, wasting time. Call them all in a single message:
# In ONE message, call all TaskOutput tools in parallel:
TaskOutput(task_id="<opus_task_id>", block=true, timeout=900000)
TaskOutput(task_id="<codex_task_id>", block=true, timeout=900000)
TaskOutput(task_id="<gemini_task_id>", block=true, timeout=900000)
Timeouts: Use 900000ms (15 minutes) for all agents. Reviews take time - don't give up early.
Do NOT timeout prematurely. When waiting for results:
timeout: 900000 (15 minutes) for all agentsIf you get a timeout and suspect the agent is still running:
block: false to check statusKnown Sandbox Errors (Ignorable):
These stderr messages can be safely ignored - they don't affect command execution:
/opt/homebrew/Library/Homebrew/help.sh: cannot create temp file - Homebrew shell integration failing in sandboxOnly treat as actual failures:
If GPT fails (rate limit, timeout, or error):
Notify the user explicitly:
"GPT review failed due to [rate limit/timeout/error]. Proceeding with remaining agents."
Continue with other agent results - don't abort the entire review
Note in final output:
GPT: Failed ([reason])If Gemini fails (rate limit, timeout, or error):
Notify the user explicitly:
"Gemini review failed due to [rate limit/timeout/error]. Proceeding with remaining agents."
Continue with other agent results - don't abort the entire review
Note in final output:
Gemini: Failed ([reason])If Opus fails: Ask user if they want to retry or proceed with remaining agents.
If all selected agents fail: Report the errors and ask user how to proceed.
Once all TaskOutput calls return (or timeout/fail):
Create unified matrix showing all agents' assessments (include only columns for agents that were selected/succeeded):
| Requirement | Opus | GPT | Gemini | Consensus | Location |
|---|---|---|---|---|---|
| [Req] | DONE | DONE | DONE | DONE | file:line |
| [Req] | PARTIAL | DONE | PARTIAL | REVIEW | file:line |
Table Formatting Guidelines:
Merge issues, noting agreement/disagreement.
For disagreements, explain which assessment is correct and why.
List unresolved questions needing human input.
# Implementation Review: [Spec/Epic Name]
## Review Configuration
- **LLMs Used:** [List agents used, e.g., "Opus + GPT + Gemini" / "Opus only" / "Opus + Gemini"]
- **Spec Reviewed:** [List of specs/issues]
- **Implementation Reviewed:** [List of impl files]
- **Categories Checked:** [List]
- **Categories Skipped:** [List]
## Completion Matrix
| Requirement | Status | Location | Notes |
|-------------|--------|----------|-------|
| ... | ... | ... | ... |
## Summary Stats
- **Total Requirements:** X
- **Complete:** Y (Z%)
- **Partial:** A
- **Missing:** B
- **Scope Creep Items:** C
- **Approach Deviations:** D (informational)
## Issues (Risk-Weighted)
### CRITICAL
- [Issues that block completion]
### HIGH
- [Significant gaps]
### MEDIUM
- [Minor gaps or quality issues]
### LOW / INFORMATIONAL
- [Scope creep, approach deviations]
## Agent Comparison (if multi-agent)
| # | Requirement | Opus | GPT | Gemini | Agree? |
|---|-------------|------|-----|--------|--------|
| ... |
(Include only columns for agents that were selected/succeeded)
## Reasoning (if multi-agent)
[Explain disagreements and final assessment]
## Remaining Ambiguities
- [Unresolved questions]
- [Items needing human input]
"Review complete. Say 'go' when you're ready to discuss next steps."
First, summarize the findings:
"Found X issues: Y critical, Z high, W medium, V low/informational."
Then ask (single choice):
question: "How would you like to proceed?"
header: "Findings"
multiSelect: false
options:
- label: "Address All Issues (Recommended)"
description: "Work through all identified issues"
- label: "Customize"
description: "Select specific issues to address"
If "Address All": Proceed to Step 5.3 with all findings selected.
If "Customize": Show the multiselect of individual findings:
question: "Which findings do you want to address?"
header: "Select Issues"
multiSelect: true
options:
[Dynamically populate from identified issues]
For each ambiguity in selected findings, present one at a time with options.
question: "What actions do you want to take?"
header: "Actions"
multiSelect: true
options:
- label: "Create beads issues for gaps"
description: "Track missing/incomplete items as new issues"
- label: "Update spec to match implementation"
description: "If implementation is intentionally different, update spec"
- label: "Save review to markdown"
description: "Save full review to docs/reviews/[name].md"
- label: "Mark tasks complete in beads"
description: "Close issues verified as complete"
- label: "Fix implementation gaps"
description: "Implement missing requirements"
Ask structure preference (epic with tasks vs individual issues), then create via bd create.
Best practices:
--parent at create time, or bd dep add <new-id> <parent-id> --type parent-childbd sync after creating issues to persist to sync branchUpdate markdown files or beads issues via bd update to reflect intentional deviations.
Ask for name, save to docs/reviews/[name].md.
For each verified-complete task, run bd update <id> --status=closed.
Work through missing requirements, implementing each one.
After applying any changes (spec updates, implementation fixes, beads updates), verify they were applied correctly:
Example verification:
Verifying changes...
- packages/auth/service.ts: ✓ Added missing validation (lines 45-52)
- specs/auth.md: ✓ Updated to reflect implementation approach
- .beads/cgt-123.md: ✓ Status updated to closed
If any change failed to apply, note it in the summary and suggest manual intervention.
## Actions Taken
- **Ambiguities resolved:** X of Y
- **Beads issues created:** [list IDs]
- **Spec updates applied:** X
- **Review saved to:** [path]
- **Tasks marked complete:** [list IDs]
- **Implementation fixes:** X requirements addressed
- **Deferred:** [list any skipped items]
If beads issues were created in Step 5.5, ask if user wants to implement them now:
question: "Beads issues created for gaps. Want to implement them now?"
header: "Implement"
multiSelect: false
options:
- label: "Yes - implement now"
description: "Work through the newly created issues immediately"
- label: "No - defer to later"
description: "Issues are tracked, implement in a future session"
If "Yes": Work through each created issue, implementing the requirements. After each implementation:
bd update <id> --status=closedbd sync to persistIf "No": End the skill. Issues remain open for future work.
Use TodoWrite to track each step:
bd show for inclusion in agent promptsrun_in_background: true in ONE message| Mistake | Correction |
|---|---|
| Exploring codebase yourself | Haiku gathers paths - you pass scope descriptions |
Missing run_in_background | BOTH Task and Bash calls need run_in_background: true |
| Sequential dispatch | ONE message with multiple tool calls for parallelism |
| Failing approach deviations | Approach deviations are INFORMATIONAL if outcome matches |
| Binary pass/fail | Use completion matrix with DONE/PARTIAL/MISSING/DEVIATED |
| Ignoring WIP state | Report current state accurately, don't require 100% completion |
| Skipping synthesis | Raw agent output isn't the deliverable - synthesize |
| Ending after beads creation | If issues created, MUST ask if user wants to implement now |
| Forgetting bd sync | Always sync after creating/updating beads issues |
| Recommending later-phase work | Check epic/sibling issues before recommending - don't bring forward work already scoped later |
| Skipping Review All option | Always offer fast-track first - only show detailed questions if user picks Customize |
| Not disambiguating epics | If multiple epics match (e.g., multiple "Phase 6"), ask user which one |
| Not pre-reading beads issues | Always pre-read with bd show and include in all agent prompts |
| Making technical assumptions | Don't assume how code works - READ it and quote specific lines |
| Silent rate limit fallback | If any agent fails, NOTIFY user explicitly before proceeding with remaining agents |
| Modifying files during review | Review phase is READ-ONLY - no Edit/Write calls, changes come later |
| Premature timeout | Use 900000ms (15 min) timeout for all agents - don't give up after 2-5 minutes |
| Unquoted special paths | Always single-quote paths with (), [], or spaces |
| Treating sandbox stderr as errors | Homebrew/profile errors in stderr are ignorable - check exit codes instead |
| Sequential pre-flight checks | Run GPT and Gemini pre-flight checks IN PARALLEL, not sequentially |
| Gemini file modifications | Gemini runs unsandboxed - prompt explicitly prohibits file modifications during review |
| Inline Gemini HEREDOC | Use TWO-STEP temp file approach - write prompt to file first, then call gemini |
| Sequential TaskOutput calls | Call ALL TaskOutput in ONE message for parallel collection |
| Extra exploration after collection | Proceed IMMEDIATELY to synthesis - don't supplement with your own reads |
| Skipping change verification | After applying changes, re-read files and summarize what changed |
| Table cell overflow | Keep completion matrix cells under 30 chars - use status shorthand |