From workflows
This skill should be used when the user asks to 'review the code', 'check implementation quality', or 'run code review'.
npx claudepluginhub edwinhu/workflows --plugin workflowsThis skill uses the workspace's default tool permissions.
- [Prerequisites - Test Output Gate](#prerequisites---test-output-gate)
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Load shared enforcement:
Auto-load all constraints matching applies-to: dev-review:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-review
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Dynamic plan re-read: Before starting review, re-read .planning/SPEC.md and .planning/PLAN.md to catch any requirements or tasks added during implementation. Do not rely on cached state from prior phases.
Single-pass code review combining spec compliance and quality checks. Uses confidence-based filtering to report only high-priority issues.
## Prerequisites - Test Output GateDo NOT start review without test evidence.
Before reviewing, verify these preconditions:
.planning/LEARNINGS.md contains actual test output| Valid Evidence | NOT Valid |
|---|---|
meson test output with results | "Tests should pass" |
pytest output showing PASS | "I wrote tests" |
| Screenshot of working UI | "It looks correct" |
| Playwright snapshot showing expected state | "User can verify" |
| D-Bus command output | "The feature works" |
| E2E test output with user flow verified | "Unit tests pass" (for UI changes) |
FOR USER-FACING CHANGES: Unit test evidence is INSUFFICIENT.
Before approving user-facing changes, verify:
| Change Type | Unit Evidence | E2E Evidence | Approval? |
|---|---|---|---|
| Internal refactor | Yes | N/A | APPROVE |
| API change | Yes | Missing | BLOCKED |
| UI change | Yes | Missing | BLOCKED |
| User workflow | Yes | Missing | BLOCKED |
Return BLOCKED if E2E evidence is missing for user-facing changes.
"Unit tests pass" without E2E for UI changes is NOT approvable.
Check LEARNINGS.md for test output:
rg -E "(PASS|OK|SUCCESS|\d+ passed)" .planning/LEARNINGS.md
If no test output is found, STOP and return to /dev-implement.
"It should work" is NOT evidence. Test output IS evidence.
After verifying test output in LEARNINGS.md, choose review strategy.
Skip this choice when:
Codex provides an out-of-process adversarial reviewer that uses a different model family than Claude — the diversity catches issues a Claude-reviewing-Claude loop would miss. When installed and authenticated, it is the default adversarial path. When unavailable, fall back to the existing Claude-based flow without prompting the user about installation.
Read ${CLAUDE_SKILL_DIR}/../../references/codex-availability.md for the full
probe and invocation contract. Execute the probe before asking the user:
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
if [ -n "$CODEX_SCRIPT" ]; then
node "$CODEX_SCRIPT" setup --json 2>/dev/null | jq -r '.ready // false'
else
echo "false"
fi
Set CODEX_READY=true only when the probe prints true. Otherwise
CODEX_READY=false and skip Codex entirely — do not announce its absence.
If CODEX_READY=true:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Codex adversarial review (Recommended)", "description": "Out-of-process adversarial review via Codex. Different model family from Claude — catches issues a Claude-on-Claude loop misses. Default for adversarial review."},
{"label": "Single Claude reviewer", "description": "Combined Claude review covering spec compliance and code quality. Faster, lower overhead. Use when Codex is overkill."},
{"label": "Parallel Claude review (Thorough)", "description": "Spawn 3 specialized Claude reviewers (Security, Performance, Tests). Use when Codex is unavailable or you want multi-perspective Claude review."}
],
"multiSelect": false
}])
If CODEX_READY=false:
AskUserQuestion(questions=[{
"question": "How should we review this implementation?",
"header": "Review Strategy",
"options": [
{"label": "Single reviewer (Default)", "description": "Combined review covering spec compliance and code quality. Faster, lower overhead."},
{"label": "Parallel review (Thorough)", "description": "Spawn 3 specialized reviewers (Security, Performance, Tests). Use for security-sensitive, performance-critical, or test-heavy PRs. Requires reconciliation."}
],
"multiSelect": false
}])
Routing:
| Choice | Go to |
|---|---|
| Codex adversarial review | Codex Adversarial Review |
| Single (Claude) reviewer | The Iron Law of Review |
| Parallel (Claude) review | Parallel Review (Thorough) |
Use this section when the user chose Codex adversarial review.
Reference: See
references/codex-availability.mdfor the full invocation contract, JSON schema, and verdict mapping table.
Before invoking Codex, verify (same as the other review paths):
If any prerequisite fails, STOP and return BLOCKED to /dev-implement.
# Working-tree review
git status --short --untracked-files=all
git diff --shortstat --cached
git diff --shortstat
Wait when the diff is clearly tiny (1-2 files, no untracked dir-sized changes). Otherwise launch in background.
Foreground (small diff):
CODEX_SCRIPT=$(find "$HOME/.claude/plugins/cache/openai-codex/codex" -maxdepth 3 -name codex-companion.mjs -type f 2>/dev/null | sort -rV | head -1)
node "$CODEX_SCRIPT" adversarial-review --wait
Background (anything bigger):
Launch with Bash(..., run_in_background: true) and tell the user:
"Codex adversarial review started in the background. Check /codex:status for progress."
Then await completion notification before proceeding to step 4.
Optional focus text — append SPEC.md context to weight the review:
node "$CODEX_SCRIPT" adversarial-review --wait "focus: REQ-AUTH-01 token rotation under retry"
Codex returns JSON validated against its review-output schema. Top-level fields:
verdict (approve | needs-attention), summary, findings[], next_steps[].
Each finding has severity, title, body, file, line_start, line_end,
confidence (0-1 float), recommendation.
Apply the iron law: only confidence >= 0.8 findings block. Multiply by 100
when displaying alongside Claude-style scores.
| Codex result | dev-review verdict |
|---|---|
verdict: approve | APPROVED |
needs-attention + any finding ≥ 0.8 confidence | CHANGES_REQUIRED |
needs-attention + all findings < 0.8 confidence | APPROVED (log advisory findings to LEARNINGS.md) |
Codex doesn't know SPEC.md REQ-IDs. For each blocking finding:
.planning/SPEC.mdOUT-OF-SPEC)OUT-OF-SPEC findings are advisory unless user opts inUse the same output structure as ## Required Output Structure below, with
Reviewer: Codex (adversarial) in the header. Each issue includes the
Codex confidence (×100) and the REQ-ID you tagged in step 5.
Codex adversarial review participates in the same REVIEW_STATE.md loop as
Claude reviewers — increment iteration on CHANGES_REQUIRED, escalate at
iteration 3. Re-runs stay on Codex unless it becomes unavailable between
runs (re-probe each iteration).
The "Iron Law of Re-Review" still applies: implementer claims "fixed" → main chat re-invokes Codex via the same command — no spot-checks.
After Codex review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES_REQUIRED: Return to /dev-implement with the parsed findings.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
Use this section when user chose "Parallel review (Thorough)" above.
Prerequisite: Requires
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSenabled. If unavailable, fall back to single reviewer.
Before spawning reviewers, verify:
git diff --name-only to scope reviewIf any prerequisite fails, STOP and return BLOCKED to /dev-implement.
Use parallel review when:
Do NOT use when:
TeamCreate(name="Code Review", task_description="Parallel code review with 3 specialized reviewers")
Press Shift+Tab to enter delegate mode. The lead coordinates reviews, does NOT review code directly.
Each reviewer receives a self-contained prompt from a reference file. Reviewers start with a blank conversation and do NOT auto-load skills. Read the prompt, substitute variables, and paste it in full.
Tool Restrictions: All reviewers are READ-ONLY. Dispatch each with allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]. Reviewers MUST NOT use Write or Edit tools. They read code, analyze it, and report findings — the main chat handles all fixes.
Before spawning, substitute these variables in each prompt:
CHANGED_FILES -> output of git diff --name-only HEAD~1 (paste actual list)SPEC_CONTEXT -> relevant sections of .planning/SPEC.md (paste inline, do NOT reference file)LEARNINGS_TEST_OUTPUT -> test output from .planning/LEARNINGS.md (paste actual output)PLUGIN_ROOT -> resolved base directory for skill paths (relative to this skill's base directory)Reviewer prompts (read, substitute variables, send as message):
| Reviewer | Focus | Prompt Source |
|---|---|---|
| 1. Security | Vulnerabilities, auth, data exposure, crypto | references/security-reviewer.md |
| 2. Performance | Complexity, queries, memory, hot paths | references/performance-reviewer.md |
| 3. Tests | Coverage, correctness, reliability, E2E | references/tests-reviewer.md |
While reviewers work, the lead:
After ALL reviewers message completion, the lead performs three passes:
**Pass 1 -- Deduplication:**Multiple reviewers may find the same issue (e.g., input validation gap found by both Security and Tests reviewers).
Example:
Security found: "file.py:42 - Input not validated (Confidence: 85)"
Tests found: "file.py:42 - Missing test for invalid input (Confidence: 80)"
-> Merge: "file.py:42 - Input validation missing + no test coverage (Confidence: 85, found by Security + Tests)"
Pass 2 -- Prioritization:
Not all issues are equally important. Rank by:
Create final prioritized list:
1. [CRITICAL] Security: XSS in user input (Confidence: 95)
2. [CRITICAL] Tests: User workflow untested (Confidence: 90)
3. [IMPORTANT] Performance: N+1 query in hot path (Confidence: 85)
4. [IMPORTANT] Tests: Error path missing coverage (Confidence: 80)
Pass 3 -- Integration Check:
Proposed fixes may conflict with each other.
Example conflict:
Security: "Add input validation on every field"
Performance: "Batch validate to reduce overhead"
-> Unified: "Batch validate with early exit on first invalid field (security + performance)"
If ANY pass finds conflicts -> resolve before reporting final verdict.
After reconciliation, the lead reports:
## Parallel Code Review: [Feature Name]
Reviewed by: Security, Performance, Tests
### Reconciliation Summary
**Issues found:** X total (Y critical, Z important)
**Duplicates merged:** N
**Conflicts resolved:** M
### Critical Issues (Must Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Important Issues (Should Fix)
[Deduplicated, prioritized list from Pass 1 + 2]
### Verdict: APPROVED | CHANGES REQUIRED
[If APPROVED]
All 3 reviewers approved with no issues >= 80 confidence.
[If CHANGES REQUIRED]
X critical and Y important issues must be addressed. Return to /dev-implement.
After parallel review completes:
If APPROVED: Immediately invoke the dev-verify skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
If CHANGES REQUIRED: Return to /dev-implement to fix reported issues.
If BLOCKED (test evidence missing): Return to /dev-implement to collect test evidence.
You MUST report only issues with >= 80% confidence. This is not negotiable.
Before reporting ANY issue, complete these verification steps:
You MUST apply this rule even when encountering:
You MUST discard any low-confidence issue found during review.
## The Iron Law of Re-ReviewNO "FIXED" CLAIMS WITHOUT FRESH RE-REVIEW. This is not negotiable.
When review returns CHANGES REQUIRED and the implementer applies fixes, you MUST:
"I fixed it" without re-reviewing is NOT HELPFUL — unverified fixes ship bugs to the user.
Iteration 1: Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 2: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Iteration 3: Re-Review → CHANGES REQUIRED → Fix → Re-Review
↓
Still issues? → ESCALATE to user
All clean? → APPROVED
Track iterations in .planning/REVIEW_STATE.md:
---
iteration: 1
max_iterations: 3
last_review_date: 2026-03-09
issues_found_count: 5
---
Exit criteria:
Before returning any verdict, check iteration count:
.planning/REVIEW_STATE.md (create if missing with iteration: 1)Claiming APPROVED without re-review after fixes is NOT HELPFUL — you're rubber-stamping unverified work that ships bugs to the user.
| Thought | Reality | Do Instead |
|---|---|---|
| "Implementer said they fixed it" | Their claim needs YOUR verification | Re-run review fresh |
| "Just spot-check the fixed lines" | Spot-checks miss regressions | Full re-review, same criteria |
| "We're on iteration 3, approve it" | Max iterations means ESCALATE, not approve | Return ESCALATE verdict |
| "The fixes are trivial, skip re-review" | Trivial fixes break things | Re-review anyway |
| "We already spent too much time" | Shipping bugs wastes more time | Re-review or escalate |
| "User is waiting" | User wants WORKING code, not fast approval | Re-review properly |
You skip re-review because you think it's helpful, efficient, or competent. Here's what actually happens:
| Your Drive | Why You Skip | What Actually Happens | The Drive You Failed |
|---|---|---|---|
| Helpfulness | "Approving fast unblocks the user" | Bugs ship. User spends hours debugging your "help." The protocol takes 10 minutes. Your shortcut cost 10 hours. | Anti-helpful |
| Competence | "I trust the implementer's claim" | Trust without verification is negligence. The bug shipped because you didn't check. A checklist would outperform you. | Incompetent |
| Efficiency | "Re-review wastes time" | The re-review takes 10 minutes. Bug rework takes 10 hours. Your "efficiency" was a 60x slowdown. | Anti-efficient |
| Approval | "User will be frustrated by delays" | User kills your session when bugs ship. They now require human oversight for all reviews. You lost autonomy. | Lost approval |
| Honesty | "I'm confident the fix worked" | You didn't verify — you fabricated an approval. Confidence without evidence is fabrication. | Dishonest |
The protocol is not overhead you pay. It is the service you provide.
Every time you skip re-review to "help faster," you choose YOUR comfort over the USER's outcome. The user doesn't experience your tedium—they experience your results.
| Thought | Why It's Wrong | Do Instead |
|---|---|---|
| "Tests probably pass" | You don't have evidence - absence of evidence is not evidence | Check LEARNINGS.md for actual output |
| "This looks wrong" | Your vague suspicion != evidence | Find concrete proof or discard |
| "I would do it differently" | Your style preference != bug | Check if it violates project guidelines |
| "This might cause problems" | Your "might" = < 80% confidence | Find proof or discard |
| "Pre-existing but should be fixed" | You're out of scope | Score it 0 and discard |
| "User can test it" | Your manual testing is less reliable than automation | Return to implement phase |
Rate each potential issue from 0-100:
| Score | Meaning |
|---|---|
| 0 | False positive or pre-existing issue |
| 25 | Might be real, might not. Stylistic without guideline backing |
| 50 | Real issue but nitpick or rare in practice |
| 75 | Verified real issue, impacts functionality |
| 100 | Absolutely certain, confirmed with direct evidence |
CRITICAL: Only report issues with confidence >= 80.
## Code Review: [Feature/Change Name]
Reviewing: [files/scope being reviewed]
### Test Evidence Verified
- Unit tests: [PASS/FAIL/MISSING] - [paste key output line]
- Integration: [PASS/FAIL/N/A]
- UI/Visual: [Screenshot taken / Snapshot verified / N/A]
### Critical Issues (Confidence >= 90)
#### [Issue Title] (Confidence: XX)
**Location:** `file/path.ext:line_number`
**Requirement:** [REQ-ID from SPEC.md — every issue MUST trace to a requirement ID]
**Problem:** Clear description of the issue
**Fix:**
```[language]
// Specific code fix
[Same format as Critical Issues]
Verdict: APPROVED | CHANGES REQUIRED | BLOCKED (no test evidence)
[If APPROVED] The reviewed code meets project standards. Tests pass. No issues with confidence >= 80 detected.
[If CHANGES REQUIRED] X critical issues and Y important issues must be addressed before proceeding.
[If BLOCKED] Cannot approve without test evidence. Return to /dev-implement and run tests.
**If review finds the implementation fundamentally violates the spec (not just minor issues), DELETE the contaminated implementation and return to dev-implement for a fresh attempt. Do not patch a structurally wrong approach.**
### Delete & Restart Protocol
**When implementation deviates fundamentally from spec, DELETE and restart entirely.**
| Situation | Action |
|-----------|--------|
| Code uses wrong protocol/architecture than spec | DELETE. Rewrite from scratch with correct approach. |
| Code implements different approach than PLAN.md | DELETE. User approved specific approach for a reason. |
| Fundamental misunderstanding of requirements | DELETE. Don't patch. Fresh subagent with correct understanding. |
| Patch would require 30%+ of implementation to change | DELETE. Rewrite is cleaner than patching wrong foundation. |
**Why delete instead of patch:** Patching a structurally wrong approach creates technical debt. Fresh implementation from correct architecture is faster than fixing wrong foundation.
**When to patch instead:** Bug in otherwise-correct implementation, missing edge case, performance tweak, minor deviation that doesn't affect core behavior.
**The test:** If the subagent says "oh, I misunderstood the whole approach" → DELETE and restart.
## Agent Invocation
Spawn Task agent for review execution:
Task(subagent_type="general-purpose", allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"]): "Review implementation against .planning/SPEC.md.
Tool Restrictions: You are READ-ONLY. You MUST NOT use Write or Edit tools. You read code, check test evidence, and report issues — you do NOT fix them. The main chat handles all fixes.
FIRST: Check .planning/LEARNINGS.md for test output. Return BLOCKED immediately if no test output is found.
Complete single-pass review covering:
Confidence score each issue (0-100). Report only issues with >= 80 confidence. Return structured output per /dev-review format."
## Drive-Aligned Framing
<EXTREMELY-IMPORTANT>
**You approving without test evidence is NOT HELPFUL — you're shipping broken code the user will have to debug.**
An "APPROVED" verdict means YOU assert:
- Tests actually ran (not "should work")
- Test output shows PASS (not SKIP, not assumed)
- Evidence exists and YOU verified it (not trusted reports)
You approving without test evidence is not "efficiency" - it is creating rework for the user.
**BLOCKED protects the user. Your fake APPROVED ships bugs.**
</EXTREMELY-IMPORTANT>
## Rationalization Prevention
STOP - you're about to rationalize if these thoughts arise--they indicate anti-helpful approval:
| Thought | Reality |
|---------|---------|
| "Tests probably pass" | Your probably != evidence. Check LEARNINGS.md. |
| "I saw the code, it looks right" | Your looking != running. Find test output. |
| "User is waiting for approval" | They want honest approval. You return BLOCKED if needed. |
| "It's a small change" | Your size estimate doesn't matter. Small changes break things. Require evidence. |
| "I trust the implementer" | Your trust doesn't replace verification. You verify evidence. |
| "I'll approve and they can fix later" | You block now or bugs ship to users. |
| "Review is just a formality" | Review is the LAST GATE before bugs ship. You execute seriously. |
## Why Skipping Review Destroys the Drive That Created It
| Your Drive | Why You Skip | What Actually Happens | The Drive You Failed |
|------------|-------------|----------------------|---------------------|
| **Helpfulness** | "Approving fast shows efficiency" | Bugs ship. User spends hours debugging your "help." | **Anti-helpful** |
| **Competence** | "I trust the subagent" | Subagent's work was wrong. Your approval was negligence. | **Incompetent** |
| **Efficiency** | "Rubber-stamp to move faster" | You moved fast into bugs. Rework takes 3x longer. | **Anti-efficient** |
| **Approval** | "Tests probably pass" | User discovers bugs. They now require human oversight for reviews. | **Trust destroyed** |
| **Honesty** | "Tests probably pass" | Test output doesn't exist. You fabricated an approval. | **Dishonest** |
**The protocol is not overhead you pay. It is the service you provide.**
## Quality Standards
- **Test evidence is mandatory** - do not approve without test output
- Do not report style preferences lacking project guideline backing
- Do not report pre-existing issues (confidence = 0)
- Make each reported issue immediately actionable
- Use absolute file paths with line numbers in reports
- Treat uncertainty as below 80 confidence
## Gate: Exit Review Loop
**Checkpoint type:** human-verify (test evidence and confidence scores are machine-verifiable)
Before claiming review is complete (APPROVED or ESCALATE):
IDENTIFY → What proves the review verdict is valid? - APPROVED: Zero issues >= 80 confidence - ESCALATE: iteration >= 3 AND issues remain
RUN → Check .planning/REVIEW_STATE.md for iteration count
Read review output for issue count
READ → Examine both: - Review output (issues list) - REVIEW_STATE.md (iteration number)
VERIFY → Verdict matches state: - APPROVED only if 0 issues - ESCALATE only if iteration >= 3 - CHANGES REQUIRED only if iteration < 3
CLAIM → Only after steps 1-4 pass, return verdict
**If iteration >= 3 and you're returning CHANGES REQUIRED instead of ESCALATE, you're ignoring the iteration limit — escalate to the user instead of looping forever.**
## Phase Complete
**Phase summary (append to LEARNINGS.md):**
```yaml
## Phase: Review
---
phase: review
status: completed
requires: [VALIDATION.md, LEARNINGS.md]
provides: [REVIEW_STATE.md, review-verdict]
verdict: APPROVED | CHANGES_REQUIRED | ESCALATE | BLOCKED
iterations: N
issues-found: X (Y critical, Z important)
---
After review completes, handle verdict-specific transitions:
Mark review complete in .planning/REVIEW_STATE.md:
---
iteration: [N]
max_iterations: 3
last_review_date: [date]
issues_found_count: 0
verdict: APPROVED
---
Immediately invoke dev-verify:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-verify/SKILL.md and follow its instructions.
Update .planning/REVIEW_STATE.md:
---
iteration: [N+1]
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: CHANGES_REQUIRED
---
Return to /dev-implement with specific issues. Implementer MUST re-invoke /dev-review after fixes.
Critical: When implementer returns claiming "fixed", you MUST re-run the FULL review. No shortcuts.
Update .planning/REVIEW_STATE.md:
---
iteration: 3
max_iterations: 3
last_review_date: [date]
issues_found_count: [count]
verdict: ESCALATE
---
Report to user:
Review Loop Escalation (3 iterations completed)
After 3 fix-review cycles, [N] issues remain:
[List issues]
Options:
1. Accept current state and proceed (issues become tech debt)
2. Extend review (manual approval for iteration 4+)
3. Rethink approach (return to /dev-design)
Which option do you prefer?
Return immediately to /dev-implement to collect test evidence. Do NOT increment iteration counter - no review occurred.
| Verdict | Next Action | Iteration Counter |
|---|---|---|
| APPROVED | Invoke /dev-verify immediately, mark task [x] in PLAN.md | Reset to 1 for next task |
| CHANGES REQUIRED | Return to /dev-implement, implementer fixes then re-invokes /dev-review | Increment |
| ESCALATE | Ask user for direction | Keep at max |
| BLOCKED | Return to /dev-implement for test evidence | No change (no review ran) |
Do NOT pause between review completion and next action. The workflow is sequential.