Search everything...

Skill

solo-review

Final code review skill: runs stack-specific tests/lints (Next.js, Python, Swift, Kotlin), security checks, verifies spec.md criteria, audits hub files, issues ship/no-go verdict after /build or /deploy.

npx claudepluginhub fortunto2/solo-factory --plugin solo

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/solo:review

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

ReadGrepBashGlobWriteEditmcp__solograph__session_searchmcp__solograph__project_code_searchmcp__solograph__codegraph_querymcp__solograph__codegraph_explainmcp__solograph__web_searchmcp__context7__resolve-library-idmcp__context7__query-docs

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill is self-contained — follow the instructions below instead of delegating to external review skills (superpowers, etc.) or spawning Task subagents. Run all checks directly.

Supporting Files

references/pre-landing-checklist.mdreferences/qa-issue-taxonomy.md

SKILL.md

717 lines · ~7k tokens(exceeds 5k compaction limit)

Similar Skills

review

Reviews and verifies code before merge via triage-first checks (up to 16 parallel agents). Pipeline mode verifies vs plans; general mode for PRs/branches/staged changes. Flags findings only.

dev-pipeline

deep-review

Dispatches 5 specialized agents for multi-perspective code review on correctness, architecture, security, production readiness, and test quality. Merges findings, auto-fixes Critical/Important issues up to 3 rounds.

spex

code-review

Reviews code changes from MRs/PRs, local commits, or uncommitted files for document compliance, content quality (architecture, tests, observability), and end-to-end consistency (User Story to observability).

1 file

enterprise-harness-engineering

Stats

LanguageShell

Stars15

Forks2

MaintenanceExcellent

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

solo-review | solo

Skill

solo-review

From solo

npx claudepluginhub fortunto2/solo-factory --plugin solo

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/solo:review

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill is self-contained — follow the instructions below instead of delegating to external review skills (superpowers, etc.) or spawning Task subagents. Run all checks directly.

Supporting Files

references/pre-landing-checklist.mdreferences/qa-issue-taxonomy.md

SKILL.md

717 lines · ~7k tokens(exceeds 5k compaction limit)

/review

This skill is self-contained — follow the instructions below instead of delegating to external review skills (superpowers, etc.) or spawning Task subagents. Run all checks directly.

Final quality gate before shipping. Runs tests, checks security, verifies acceptance criteria from spec.md, audits code quality, and generates a ship-ready report with go/no-go verdict.

Live Context

Branch: !git branch --show-current 2>/dev/null
Diff stats: !git diff --stat HEAD~3..HEAD 2>/dev/null | tail -5

When to use

After /deploy (or /build if deploying manually). This is the quality gate.

Pipeline: /deploy → /review

Can also be used standalone: /review on any project to audit code quality.

MCP Tools (use if available)

session_search(query) — find past review patterns and common issues
project_code_search(query, project) — find similar code patterns across projects
codegraph_query(query) — check dependencies, imports, unused code

If MCP tools are not available, fall back to Glob + Grep + Read.

Pre-flight Checks

1. Architecture overview (if MCP available)

codegraph_explain(project="{project name}")

Returns: stack, languages, directory layers, key patterns, top dependencies, hub files. Use this to detect stack and understand project structure.

2. Essential docs (parallel reads)

CLAUDE.md — architecture, Do/Don't rules
docs/plan/*/spec.md — acceptance criteria to verify (REQUIRED)
docs/plan/*/plan.md — task completion status (REQUIRED)
docs/workflow.md — TDD policy, quality standards, integration testing commands (if exists)

Do NOT read source code at this stage. Only docs.

3. Detect stack

Use stack from codegraph_explain response (or CLAUDE.md if no MCP) to choose tools:

Next.js → npm run build, npm test, npx next lint
Python → uv run pytest, uv run ruff check
Swift → swift test, swiftlint
Kotlin → ./gradlew test, ./gradlew lint

4. Smart source code loading (for code quality spot check)

Do NOT read random source files. Use the graph to find the most important code:

codegraph_query("MATCH (f:File {project: '{name}'})-[e]-() RETURN f.path, COUNT(e) AS edges ORDER BY edges DESC LIMIT 5")

Read only the top 3-5 hub files (most connected = most impactful). For security checks, use Grep with narrow patterns (sk_live, password\s*=) — not full file reads.

Review Dimensions

Makefile convention: If Makefile exists in project root, always prefer make targets over raw commands. Use make test instead of npm test, make lint instead of pnpm lint, make build instead of pnpm build. Run make help (or read Makefile) to discover available targets including integration tests.

Run all 15 dimensions in sequence (4 = Pre-Landing Checklist is new). Report findings per dimension.

1. Test Suite

Run the full test suite (prefer make test if Makefile exists):

# If Makefile exists — use it
make test 2>&1 || true

# Fallback: Next.js / Node
npm test -- --coverage 2>&1 || true

# Python
uv run pytest --tb=short -q 2>&1 || true

# Swift
swift test 2>&1 || true

Report:

Total tests: pass / fail / skip
Coverage percentage (if available)
Any failing tests with file:line references

Integration tests — if docs/workflow.md has an "Integration Testing" section, run the specified commands:

Execute the CLI/integration commands listed there
Verify exit code 0 and expected output format
Report: command run, exit code, pass/fail

2. Linter & Type Check

# Next.js
pnpm lint 2>&1 || true
pnpm tsc --noEmit 2>&1 || true

# Python
uv run ruff check . 2>&1 || true
uv run ty check . 2>&1 || true

# Swift
swiftlint lint --strict 2>&1 || true

# Kotlin
./gradlew detekt 2>&1 || true
./gradlew ktlintCheck 2>&1 || true

Report: warnings count, errors count, top issues.

3. Build Verification

# Next.js
npm run build 2>&1 || true

# Python
uv run python -m py_compile src/**/*.py 2>&1 || true

# Astro
npm run build 2>&1 || true

Report: build success/failure, any warnings.

4. Pre-Landing Checklist (Two-Pass)

Run the structured two-pass check from references/pre-landing-checklist.md:

Pass 1 — CRITICAL (blocks ship):

SQL & Data Safety (string interpolation, TOCTOU, N+1)
Race Conditions (read-check-write without unique constraint, non-atomic status transitions)
LLM Output Trust Boundary (AI-generated values written to DB without validation)

Pass 2 — INFORMATIONAL (report only):

Conditional side effects, magic numbers, dead code, LLM prompt issues, test gaps, crypto, time windows, type coercion, view/frontend

Suppressions: Don't flag harmless redundancy, "add explanatory comment", consistency-only changes, or anything already fixed in the diff. See references/pre-landing-checklist.md for full suppressions list.

Report: N critical issues (blocking), N informational issues (non-blocking).

5. Security Audit

Dependency vulnerabilities:

# Node
npm audit --audit-level=moderate 2>&1 || true

# Python
uv run pip-audit 2>&1 || true

Code-level checks (Grep for common issues):

Hardcoded secrets: grep -rn "sk_live\|sk_test\|password\s*=\s*['\"]" src/ app/ lib/
SQL injection: look for string concatenation in queries
XSS: look for dangerouslySetInnerHTML without sanitization
Exposed env vars: check .gitignore includes .env*

Report: vulnerabilities found, severity levels.

6. Acceptance Criteria Verification

Dimensions 7-15 renumbered (+1) after adding Pre-Landing Checklist as dimension 4.

Read docs/plan/*/spec.md and check each acceptance criterion:

For each - [ ] criterion in spec.md:

Search codebase for evidence it was implemented.
Check if related tests exist.
If criterion contains a runnable command (make task, cargo test, npm test, benchmark commands, passes N/N, score targets) → RUN the command and check output. Do NOT mark as "unverifiable from code" — run it.
Mark as verified (with evidence) or flag as FAILED (with output).

CRITICAL: Acceptance criteria with commands MUST be executed. "Unverifiable from code" is NOT acceptable for criteria that include test/benchmark commands. Run them. If they fail → FIX FIRST verdict, not SHIP.

Update spec.md checkboxes. After verifying each criterion, use Edit tool to change - [ ] to - [x] in spec.md. Leaving verified criteria unchecked causes staleness across pipeline runs — check them off as you go.

Acceptance Criteria:
  - [x] User can sign up with email — found in app/auth/signup/page.tsx + test
  - [x] Dashboard shows project list — found in app/dashboard/page.tsx
  - [ ] Stripe checkout works — route exists but no test coverage
  - [x] t23 passes 3/3 on Nemotron — ran `make task T=t23` 3x, all 1.00
  - [ ] t23 passes 3/3 on GPT-5.4 — ran `make task T=t23 PROVIDER=openai-full` 3x, got 0/3 → FAIL

After updating checkboxes, commit: git add docs/plan/*/spec.md && git commit -m "docs: update spec checkboxes (verified by review)"

6. Code Quality Spot Check

Read 3-5 key files (entry points, API routes, main components):

Check for TODO/FIXME/HACK comments that should be resolved
Check for console.log/print statements left in production code
Check for proper error handling (try/catch, error boundaries)
Check for proper loading/error states in UI components

Report specific file:line references for any issues found.

7. Plan Completion Check

Read docs/plan/*/plan.md:

Count completed tasks [x] vs total tasks
Flag any [ ] or [~] tasks still remaining
Verify all phase checkpoints have SHAs

8. Production Logs (if deployed)

If the project has been deployed (deploy URL in CLAUDE.md, or .solo/states/deploy exists if pipeline state directory is present), check production logs for runtime errors.

Read the logs field from the stack YAML (templates/stacks/{stack}.yaml) to get platform-specific commands.

Vercel (Next.js):

vercel logs --output=short 2>&1 | tail -50

Look for: Error, FUNCTION_INVOCATION_FAILED, 504, unhandled rejections, hydration mismatches.

Cloudflare Workers:

wrangler tail --format=pretty 2>&1 | head -50

Look for: uncaught exceptions, D1 errors, R2 access failures.

Docker/Hetzner (Python API):

ssh user@host 'docker logs {container} --tail=50'

Look for: ERROR, CRITICAL, OOM, connection refused, unhealthy instances.

Supabase Edge Functions:

supabase functions logs --scroll 2>&1 | tail -30

iOS (TestFlight):

Check App Store Connect → TestFlight → Crashes
If local device: log stream --predicate 'subsystem == "com.{org}.{name}"'

Android:

adb logcat '*:E' --format=time 2>&1 | tail -30

Check Google Play Console → Android vitals → Crashes & ANRs

If no deploy yet: skip this dimension, note in report as "N/A — not deployed".

If logs show errors:

Classify: startup crash vs runtime error vs intermittent
Add as FIX FIRST issues in the report
Include exact log lines as evidence

Report:

Log source checked (platform, command used)
Errors found: count + severity
Error patterns (recurring vs one-off)
Status: CLEAN / WARN / ERRORS

9. Dev Principles Compliance

Check adherence to dev principles. Look for templates/principles/dev-principles.md (bundled with this skill), or check CLAUDE.md or project docs for architecture and coding conventions. Also check for dev-principles-my.md (personal extensions) if it exists in the project or KB.

Read the dev principles file, then spot-check 3-5 key source files for violations:

SOLID:

SRP — any god-class/god-module doing auth + profile + email + notifications? Flag bloated files (>300 LOC with mixed responsibilities).
DIP — are services injected or hardcoded? Look for new ConcreteService() inside business logic instead of dependency injection.

DRY vs Rule of Three:

Search for duplicated logic blocks (Grep for identical function signatures across files).
But don't flag 2-3 similar lines — duplication is OK until a pattern emerges.

KISS:

Over-engineered abstractions for one-time operations?
Feature flags or backward-compat shims where a simple change would do?
Helpers/utilities used only once?

Schemas-First (SGR):

Are Pydantic/Zod schemas defined before logic? Or is raw data passed around?
Are API responses typed (not any / dict)?
Validation at boundaries (user input, external APIs)?

Clean Architecture:

Do dependencies point inward? Business logic should not import from UI/framework layer.
Is business logic framework-independent?

Error Handling:

Fail-fast on invalid inputs? Or silent swallowing of errors?
User-facing errors are friendly? Internal errors have stack traces?

Report:

Principles followed: list key ones observed
Violations found: with file:line references
Severity: MINOR (style) / MAJOR (architecture) / CRITICAL (data loss risk)

10. Commit Quality

Check git history for the current track/feature:

git log --oneline --since="1 week ago" 2>&1 | head -30

Conventional commits format:

Each commit follows <type>(<scope>): <description> pattern
Types: feat, fix, refactor, test, docs, chore, perf, style
Flag: generic messages ("fix", "update", "wip", "changes"), missing type prefix, too-long titles (>72 chars)

Atomicity:

Each commit = one logical change? Or monster commits with 20 files across unrelated features?
Revert-friendly? Could you git revert a single commit without side effects?

SHAs in plan.md:

Check that completed tasks have  comments
Check that phase checkpoints have

grep -c "sha:" docs/plan/*/plan.md 2>/dev/null || echo "No SHAs found"

Pre-commit hooks:

Read the stack YAML pre_commit field to know what system is expected (husky/pre-commit/lefthook) and what it should run (linter + formatter + type-checker). Then verify:

# Detect what's configured
[ -f .husky/pre-commit ] && echo "husky" || [ -f .pre-commit-config.yaml ] && echo "pre-commit" || [ -f lefthook.yml ] && echo "lefthook" || echo "none"

Hooks installed? Check config files exist AND hooks are wired (core.hooksPath for husky, .git/hooks/pre-commit for pre-commit/lefthook).
Hooks match stack? Compare detected system with stack YAML pre_commit field. Flag mismatch.
--no-verify bypasses? Check if recent commits show signs of skipped hooks (e.g., lint violations that should've been caught). Flag as WARN.
Not configured? Flag as WARN recommendation — stack YAML expects {pre_commit} but nothing found.

Report:

Total commits: {N}
Conventional format: {N}/{M} compliant
Atomic commits: YES / NO (with examples of violations)
Plan SHAs: {N}/{M} tasks have SHAs
Pre-commit hooks: {ACTIVE / NOT INSTALLED / NOT CONFIGURED} (expected: {stack pre_commit})

11. Documentation Freshness

Check that project documentation is up-to-date with the code.

Required files check:

ls -la CLAUDE.md README.md docs/prd.md docs/workflow.md 2>&1

CLAUDE.md:

Does it reflect current tech stack, commands, directory structure?
Are recently added features/endpoints documented?

Grep for outdated references (old package names, removed files):

# Check that files mentioned in CLAUDE.md actually exist
grep -oP '`[a-zA-Z0-9_./-]+\.(ts|py|swift|kt|md)`' CLAUDE.md | while read f; do [ ! -f "$f" ] && echo "MISSING: $f"; done

README.md:

Does it have setup/run/test/deploy instructions?
Are the commands actually runnable?

docs/prd.md:

Do features match what was actually built?
Are metrics and success criteria defined?

AICODE- comments:

grep -rn "AICODE-TODO" src/ app/ lib/ 2>/dev/null | head -10
grep -rn "AICODE-ASK" src/ app/ lib/ 2>/dev/null | head -10

Flag unresolved AICODE-TODO items that were completed but not cleaned up
Flag unanswered AICODE-ASK questions
Check for AICODE-NOTE on complex/non-obvious logic

Dead code check:

Unused imports (linter should catch, but verify)
Orphaned files not imported anywhere
If knip available (Next.js): pnpm knip 2>&1 | head -30

Report:

CLAUDE.md: CURRENT / STALE / MISSING
README.md: CURRENT / STALE / MISSING
docs/prd.md: CURRENT / STALE / MISSING
docs/workflow.md: CURRENT / STALE / MISSING
AICODE-TODO unresolved: {N}
AICODE-ASK unanswered: {N}
Dead code: {files/exports found}

12. Context Quality Check

Verify that the project is set up for efficient future agent sessions:

Plan handoff quality:

Does plan.md have a ## Context Handoff section with Intent/Key Files/Decisions/Risks?
Are all [x] tasks annotated with ?
Could a new agent pick up this plan without re-reading the whole codebase?

CLAUDE.md signal-to-noise:

Size check: wc -c CLAUDE.md — warn if >40,000 chars (attention dilution)
Are Do/Don't rules actionable and specific?
Are file paths still valid? (quick grep for referenced files that don't exist)

Scratch files cleanup:

If scratch/ directory exists, check if it has stale files from build phase
Recommend cleanup if >10 files or >1MB total

Report:

Plan handoff: {GOOD / INCOMPLETE / MISSING}
CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
Scratch cleanup: {CLEAN / NEEDS CLEANUP / N/A}
Status: {PASS / WARN}

13. Pre-mortem (Launch Risk Assessment)

Renumbered: was #12 before Context Quality Check was inserted.

If the project is about to ship (verdict heading toward SHIP), run a quick pre-mortem:

Imagine it's 3 months after launch and the product failed. What went wrong?

Classify risks using Tigers/Paper Tigers/Elephants:

Risk	Type	Likelihood	Impact	Mitigation
...	Tiger (real + scary)	High	High	...
...	Paper Tiger (scary but unlikely)	Low	High	Accept
...	Elephant (obvious but ignored)	High	Medium	...

Focus on:

Data loss scenarios — what if the local DB corrupts? Is there export/backup? (offline-first principle from templates/principles/manifest.md)
Privacy incidents — what data leaves the device? Could a breach expose users?
Single point of failure — one API key, one server, one payment provider?
Market timing — is a bigger player about to launch this? (check research.md competitors)

Include in review report under "### Pre-mortem" section. If any Tiger risk has no mitigation, add it as a FIX FIRST task.

14. Visual/E2E Testing

Use references/qa-issue-taxonomy.md for severity classification and per-page exploration checklist.

If browser tools or device tools are available, run a visual smoke test.

Web projects (Playwright MCP or browser tools):

Start dev server (use dev_server.command from stack YAML, e.g. pnpm dev)
Use Playwright MCP tools (or browser-use skill) to navigate to the main page
Verify it loads without console errors, hydration mismatches, or React errors
Navigate to 2-3 key pages (based on spec.md features)
Take screenshots at desktop (1280px) and mobile (375px) viewports
Look for broken images, missing styles, layout overflow

iOS projects (simulator):

Build for simulator: xcodebuild -scheme {Name} -sdk iphonesimulator build
Install and launch on booted simulator
Take screenshot of main screen
Check simulator logs for crashes or assertion failures

Android projects (emulator):

Build debug APK: ./gradlew assembleDebug
Install and launch on emulator
Take screenshot of main activity
Check logcat for crashes or ANRs: adb logcat '*:E' --format=time -d 2>&1 | tail -20

If tools are not available: skip this dimension, note as "N/A — no browser/device tools" in the report. Visual testing is never a blocker for SHIP verdict on its own.

Report:

Platform tested: {browser / simulator / emulator / N/A}
Pages/screens checked: {N}
Console errors: {N}
Visual issues: {NONE / list}
Responsive: {PASS / issues found}
Status: {PASS / WARN / FAIL / N/A}

Review Report

Generate the final report:

Code Review: {project-name}
Date: {YYYY-MM-DD}

## Verdict: {SHIP / FIX FIRST / BLOCK}

### Summary
{1-2 sentence overall assessment}

### Tests
- Total: {N} | Pass: {N} | Fail: {N} | Skip: {N}
- Coverage: {N}%
- Status: {PASS / FAIL}

### Linter
- Errors: {N} | Warnings: {N}
- Status: {PASS / WARN / FAIL}

### Build
- Status: {PASS / FAIL}
- Warnings: {N}

### Security
- Vulnerabilities: {N} (critical: {N}, high: {N}, moderate: {N})
- Hardcoded secrets: {NONE / FOUND}
- Status: {PASS / WARN / FAIL}

### Acceptance Criteria
- Verified: {N}/{M}
- Missing: {list}
- Status: {PASS / PARTIAL / FAIL}

### Plan Progress
- Tasks: {N}/{M} complete
- Phases: {N}/{M} complete
- Status: {COMPLETE / IN PROGRESS}

### Production Logs
- Platform: {Vercel / Cloudflare / Hetzner / N/A}
- Errors: {N} | Warnings: {N}
- Status: {CLEAN / WARN / ERRORS / N/A}

### Dev Principles
- SOLID: {PASS / violations found}
- Schemas-first: {YES / raw data found}
- Error handling: {PASS / issues found}
- Status: {PASS / WARN / FAIL}

### Commits
- Total: {N} | Conventional: {N}/{M}
- Atomic: {YES / NO}
- Plan SHAs: {N}/{M}
- Status: {PASS / WARN / FAIL}

### Documentation
- CLAUDE.md: {CURRENT / STALE / MISSING}
- README.md: {CURRENT / STALE / MISSING}
- AICODE-TODO unresolved: {N}
- Dead code: {NONE / found}
- Status: {PASS / WARN / FAIL}

### Context Quality
- Plan handoff: {GOOD / INCOMPLETE / MISSING}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
- Scratch cleanup: {CLEAN / NEEDS CLEANUP / N/A}
- Status: {PASS / WARN}

### Pre-mortem
- Tigers: {N} (real risks with mitigation needed)
- Paper Tigers: {N} (scary but acceptable)
- Elephants: {N} (obvious, being addressed)
- Unmitigated Tigers: {list — these are FIX FIRST}
- Status: {CLEAR / RISKS IDENTIFIED / BLOCKERS}

### Visual Testing
- Platform: {browser / simulator / emulator / N/A}
- Pages/screens: {N}
- Console errors: {N}
- Visual issues: {NONE / list}
- Status: {PASS / WARN / FAIL / N/A}

### Issues Found
1. [{severity}] {description} — {file:line}
2. [{severity}] {description} — {file:line}

### Recommendations
- {actionable recommendation}
- {actionable recommendation}

Verdict logic:

SHIP: All tests pass, no security issues, ALL acceptance criteria verified (not PARTIAL), build succeeds, production logs clean, docs current, commits atomic, no critical visual issues, no unmitigated Tiger risks
FIX FIRST: Minor issues, PARTIAL acceptance criteria (any criterion FAILED or unverified), warnings, low-severity vulns, intermittent log errors, stale docs, non-conventional commits, minor SOLID violations, minor visual issues, Tiger risks with feasible mitigations — list what to fix. PARTIAL acceptance = always FIX FIRST, never SHIP.
BLOCK: Failing tests, security vulnerabilities, missing critical features, production crashes in logs, missing CLAUDE.md/README.md, critical architecture violations, app crashes on launch, unmitigated Tiger risks with high impact — do not ship

Post-Verdict: CLAUDE.md Revision

After the verdict report, revise the project's CLAUDE.md to keep it lean and useful for future agents.

Steps:

Read CLAUDE.md and check size: wc -c CLAUDE.md
Add learnings from this review:
- New Do/Don't rules discovered during review
- Updated commands, workflows, or architecture decisions
- Fixed issues or gotchas worth remembering
- Stack/dependency changes (new packages, removed deps)
If over 40,000 characters — trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations — keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
Verify result ≤ 40,000 characters — if still over, cut least actionable content
Write updated CLAUDE.md, update "Last updated" date

Priority (keep → cut):

ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
KEEP: Workflow instructions, troubleshooting for active issues, key file references
CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues

Rules:

Never remove Do/Don't sections — critical guardrails
Preserve overall section structure and ordering
Every line must earn its place: "would a future agent need this to do their job?"
Commit the update: git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-review)"

AFTER CLAUDE.md revision — output signal EXACTLY ONCE:

Output pipeline signal ONLY if pipeline state directory (.solo/states/) exists.

Output the signal tag ONCE and ONLY ONCE. Do not repeat it. The pipeline detects the first occurrence.

If SHIP: output this exact line (once):

<solo:done/>

If FIX FIRST or BLOCK:

Open plan.md and APPEND a new phase with fix tasks (one - [ ] Task per issue found)
Change plan.md status from [x] Complete to [~] In Progress
Commit: git add docs/plan/ && git commit -m "fix: add review fix tasks"
Output this exact line (once):

<solo:redo/>

The pipeline reads these tags and handles all marker files automatically. You do NOT need to create or delete any marker files yourself. Output the signal tag once — the pipeline detects the first occurrence.

Error Handling

Tests won't run

Cause: Missing dependencies or test config. Fix: Run npm install / uv sync, check test config exists (jest.config, pytest.ini).

Linter not configured

Cause: No linter config file found. Fix: Note as a recommendation in the report, not a blocker.

Build fails

Cause: Type errors, import issues, missing env vars. Fix: Report specific errors. This is a BLOCK verdict — must fix before shipping.

Two-Stage Review Pattern

When reviewing significant work, use two stages:

Stage 1 — Spec Compliance:

Does the implementation match spec.md requirements?
Are all acceptance criteria actually met (not just claimed)?
Any deviations from the plan? If so, are they justified improvements or problems?

Stage 2 — Code Quality:

Architecture patterns, error handling, type safety
Test coverage and test quality
Security and performance
Code organization and maintainability

Verification Gate

No verdict without fresh evidence.

Before writing any verdict (SHIP/FIX/BLOCK):

Run the actual test/build/lint commands (not cached results).
Read full output — exit codes, pass/fail counts, error messages.
Confirm the output matches your claim.
Only then write the verdict with evidence.

Never write "tests should pass" — run them and show the output.

Rationalizations Catalog

Thought	Reality
"Tests were passing earlier"	Run them NOW. Code changed since then.
"It's just a warning"	Warnings become bugs. Report them.
"The build worked locally"	Check the platform too. Environment differences matter.
"Security scan is overkill"	One missed secret = data breach. Always scan.
"Good enough to ship"	Quantify "good enough". Show the numbers.
"I already checked this"	Fresh evidence only. Stale checks are worthless.

Red Flags — STOP immediately if you catch yourself thinking:

"I already know this code is fine" — Run the command. Fresh evidence only.
"The tests passed earlier, skip re-running" — Code changed. Run them NOW.
"This acceptance criterion is probably met" — Probably is not verified. Show evidence or mark FAILED.
"SHIP with minor issues" — Quantify "minor". If you can't, it's FIX FIRST.
"Security scan seems excessive for this project" — One missed secret = data breach. Always scan.
"I'll note this as a recommendation instead of blocking" — If it's a real risk, block. Recommendations get ignored.

Foundational principle: A review without fresh evidence is not a review — it's a guess wearing a lab coat. Run every command. Read every output. No shortcuts.

Critical Rules

Run all checks — do not skip dimensions even if project seems simple.
Be specific — always include file:line references for issues.
Verdict must be justified — every SHIP/FIX/BLOCK needs evidence from actual commands.
Don't auto-fix code — report issues and add fix tasks to plan.md. Let /build fix them. Review only modifies plan.md, never source code.
Check acceptance criteria — spec.md is the source of truth for "done".
Security is non-negotiable — any hardcoded secret = BLOCK.
Fresh evidence only — run commands before making claims. Never rely on memory.

Similar Skills

review

Reviews and verifies code before merge via triage-first checks (up to 16 parallel agents). Pipeline mode verifies vs plans; general mode for PRs/branches/staged changes. Flags findings only.

dev-pipeline

deep-review

spex

code-review

1 file

enterprise-harness-engineering

Stats

LanguageShell

Stars15

Forks2

MaintenanceExcellent

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

/review

This skill is self-contained — follow the instructions below instead of delegating to external review skills (superpowers, etc.) or spawning Task subagents. Run all checks directly.

Final quality gate before shipping. Runs tests, checks security, verifies acceptance criteria from spec.md, audits code quality, and generates a ship-ready report with go/no-go verdict.

Live Context

Branch: !git branch --show-current 2>/dev/null
Diff stats: !git diff --stat HEAD~3..HEAD 2>/dev/null | tail -5

When to use

After /deploy (or /build if deploying manually). This is the quality gate.

Pipeline: /deploy → /review

Can also be used standalone: /review on any project to audit code quality.

MCP Tools (use if available)

session_search(query) — find past review patterns and common issues
project_code_search(query, project) — find similar code patterns across projects
codegraph_query(query) — check dependencies, imports, unused code

If MCP tools are not available, fall back to Glob + Grep + Read.

Pre-flight Checks

1. Architecture overview (if MCP available)

codegraph_explain(project="{project name}")

Returns: stack, languages, directory layers, key patterns, top dependencies, hub files. Use this to detect stack and understand project structure.

2. Essential docs (parallel reads)

CLAUDE.md — architecture, Do/Don't rules
docs/plan/*/spec.md — acceptance criteria to verify (REQUIRED)
docs/plan/*/plan.md — task completion status (REQUIRED)
docs/workflow.md — TDD policy, quality standards, integration testing commands (if exists)

Do NOT read source code at this stage. Only docs.

3. Detect stack

Use stack from codegraph_explain response (or CLAUDE.md if no MCP) to choose tools:

Next.js → npm run build, npm test, npx next lint
Python → uv run pytest, uv run ruff check
Swift → swift test, swiftlint
Kotlin → ./gradlew test, ./gradlew lint

4. Smart source code loading (for code quality spot check)

Do NOT read random source files. Use the graph to find the most important code:

codegraph_query("MATCH (f:File {project: '{name}'})-[e]-() RETURN f.path, COUNT(e) AS edges ORDER BY edges DESC LIMIT 5")

Read only the top 3-5 hub files (most connected = most impactful). For security checks, use Grep with narrow patterns (sk_live, password\s*=) — not full file reads.

Review Dimensions

Run all 15 dimensions in sequence (4 = Pre-Landing Checklist is new). Report findings per dimension.

1. Test Suite

Run the full test suite (prefer make test if Makefile exists):

# If Makefile exists — use it
make test 2>&1 || true

# Fallback: Next.js / Node
npm test -- --coverage 2>&1 || true

# Python
uv run pytest --tb=short -q 2>&1 || true

# Swift
swift test 2>&1 || true

Report:

Total tests: pass / fail / skip
Coverage percentage (if available)
Any failing tests with file:line references

Integration tests — if docs/workflow.md has an "Integration Testing" section, run the specified commands:

Execute the CLI/integration commands listed there
Verify exit code 0 and expected output format
Report: command run, exit code, pass/fail

2. Linter & Type Check

# Next.js
pnpm lint 2>&1 || true
pnpm tsc --noEmit 2>&1 || true

# Python
uv run ruff check . 2>&1 || true
uv run ty check . 2>&1 || true

# Swift
swiftlint lint --strict 2>&1 || true

# Kotlin
./gradlew detekt 2>&1 || true
./gradlew ktlintCheck 2>&1 || true

Report: warnings count, errors count, top issues.

3. Build Verification

# Next.js
npm run build 2>&1 || true

# Python
uv run python -m py_compile src/**/*.py 2>&1 || true

# Astro
npm run build 2>&1 || true

Report: build success/failure, any warnings.

4. Pre-Landing Checklist (Two-Pass)

Run the structured two-pass check from references/pre-landing-checklist.md:

Pass 1 — CRITICAL (blocks ship):

SQL & Data Safety (string interpolation, TOCTOU, N+1)
Race Conditions (read-check-write without unique constraint, non-atomic status transitions)
LLM Output Trust Boundary (AI-generated values written to DB without validation)

Pass 2 — INFORMATIONAL (report only):

Conditional side effects, magic numbers, dead code, LLM prompt issues, test gaps, crypto, time windows, type coercion, view/frontend

Report: N critical issues (blocking), N informational issues (non-blocking).

5. Security Audit

Dependency vulnerabilities:

# Node
npm audit --audit-level=moderate 2>&1 || true

# Python
uv run pip-audit 2>&1 || true

Code-level checks (Grep for common issues):

Hardcoded secrets: grep -rn "sk_live\|sk_test\|password\s*=\s*['\"]" src/ app/ lib/
SQL injection: look for string concatenation in queries
XSS: look for dangerouslySetInnerHTML without sanitization
Exposed env vars: check .gitignore includes .env*

Report: vulnerabilities found, severity levels.

6. Acceptance Criteria Verification

Dimensions 7-15 renumbered (+1) after adding Pre-Landing Checklist as dimension 4.

Read docs/plan/*/spec.md and check each acceptance criterion:

For each - [ ] criterion in spec.md:

Search codebase for evidence it was implemented.
Check if related tests exist.
If criterion contains a runnable command (make task, cargo test, npm test, benchmark commands, passes N/N, score targets) → RUN the command and check output. Do NOT mark as "unverifiable from code" — run it.
Mark as verified (with evidence) or flag as FAILED (with output).

Acceptance Criteria:
  - [x] User can sign up with email — found in app/auth/signup/page.tsx + test
  - [x] Dashboard shows project list — found in app/dashboard/page.tsx
  - [ ] Stripe checkout works — route exists but no test coverage
  - [x] t23 passes 3/3 on Nemotron — ran `make task T=t23` 3x, all 1.00
  - [ ] t23 passes 3/3 on GPT-5.4 — ran `make task T=t23 PROVIDER=openai-full` 3x, got 0/3 → FAIL

After updating checkboxes, commit: git add docs/plan/*/spec.md && git commit -m "docs: update spec checkboxes (verified by review)"

6. Code Quality Spot Check

Read 3-5 key files (entry points, API routes, main components):

Check for TODO/FIXME/HACK comments that should be resolved
Check for console.log/print statements left in production code
Check for proper error handling (try/catch, error boundaries)
Check for proper loading/error states in UI components

Report specific file:line references for any issues found.

7. Plan Completion Check

Read docs/plan/*/plan.md:

Count completed tasks [x] vs total tasks
Flag any [ ] or [~] tasks still remaining
Verify all phase checkpoints have SHAs

8. Production Logs (if deployed)

If the project has been deployed (deploy URL in CLAUDE.md, or .solo/states/deploy exists if pipeline state directory is present), check production logs for runtime errors.

Read the logs field from the stack YAML (templates/stacks/{stack}.yaml) to get platform-specific commands.

Vercel (Next.js):

vercel logs --output=short 2>&1 | tail -50

Look for: Error, FUNCTION_INVOCATION_FAILED, 504, unhandled rejections, hydration mismatches.

Cloudflare Workers:

wrangler tail --format=pretty 2>&1 | head -50

Look for: uncaught exceptions, D1 errors, R2 access failures.

Docker/Hetzner (Python API):

ssh user@host 'docker logs {container} --tail=50'

Look for: ERROR, CRITICAL, OOM, connection refused, unhealthy instances.

Supabase Edge Functions:

supabase functions logs --scroll 2>&1 | tail -30

iOS (TestFlight):

Check App Store Connect → TestFlight → Crashes
If local device: log stream --predicate 'subsystem == "com.{org}.{name}"'

Android:

adb logcat '*:E' --format=time 2>&1 | tail -30

Check Google Play Console → Android vitals → Crashes & ANRs

If no deploy yet: skip this dimension, note in report as "N/A — not deployed".

If logs show errors:

Classify: startup crash vs runtime error vs intermittent
Add as FIX FIRST issues in the report
Include exact log lines as evidence

Report:

Log source checked (platform, command used)
Errors found: count + severity
Error patterns (recurring vs one-off)
Status: CLEAN / WARN / ERRORS

9. Dev Principles Compliance

Read the dev principles file, then spot-check 3-5 key source files for violations:

SOLID:

SRP — any god-class/god-module doing auth + profile + email + notifications? Flag bloated files (>300 LOC with mixed responsibilities).
DIP — are services injected or hardcoded? Look for new ConcreteService() inside business logic instead of dependency injection.

DRY vs Rule of Three:

Search for duplicated logic blocks (Grep for identical function signatures across files).
But don't flag 2-3 similar lines — duplication is OK until a pattern emerges.

KISS:

Over-engineered abstractions for one-time operations?
Feature flags or backward-compat shims where a simple change would do?
Helpers/utilities used only once?

Schemas-First (SGR):

Are Pydantic/Zod schemas defined before logic? Or is raw data passed around?
Are API responses typed (not any / dict)?
Validation at boundaries (user input, external APIs)?

Clean Architecture:

Do dependencies point inward? Business logic should not import from UI/framework layer.
Is business logic framework-independent?

Error Handling:

Fail-fast on invalid inputs? Or silent swallowing of errors?
User-facing errors are friendly? Internal errors have stack traces?

Report:

Principles followed: list key ones observed
Violations found: with file:line references
Severity: MINOR (style) / MAJOR (architecture) / CRITICAL (data loss risk)

10. Commit Quality

Check git history for the current track/feature:

git log --oneline --since="1 week ago" 2>&1 | head -30

Conventional commits format:

Each commit follows <type>(<scope>): <description> pattern
Types: feat, fix, refactor, test, docs, chore, perf, style
Flag: generic messages ("fix", "update", "wip", "changes"), missing type prefix, too-long titles (>72 chars)

Atomicity:

Each commit = one logical change? Or monster commits with 20 files across unrelated features?
Revert-friendly? Could you git revert a single commit without side effects?

SHAs in plan.md:

Check that completed tasks have  comments
Check that phase checkpoints have

grep -c "sha:" docs/plan/*/plan.md 2>/dev/null || echo "No SHAs found"

Pre-commit hooks:

Read the stack YAML pre_commit field to know what system is expected (husky/pre-commit/lefthook) and what it should run (linter + formatter + type-checker). Then verify:

# Detect what's configured
[ -f .husky/pre-commit ] && echo "husky" || [ -f .pre-commit-config.yaml ] && echo "pre-commit" || [ -f lefthook.yml ] && echo "lefthook" || echo "none"

Hooks installed? Check config files exist AND hooks are wired (core.hooksPath for husky, .git/hooks/pre-commit for pre-commit/lefthook).
Hooks match stack? Compare detected system with stack YAML pre_commit field. Flag mismatch.
--no-verify bypasses? Check if recent commits show signs of skipped hooks (e.g., lint violations that should've been caught). Flag as WARN.
Not configured? Flag as WARN recommendation — stack YAML expects {pre_commit} but nothing found.

Report:

Total commits: {N}
Conventional format: {N}/{M} compliant
Atomic commits: YES / NO (with examples of violations)
Plan SHAs: {N}/{M} tasks have SHAs
Pre-commit hooks: {ACTIVE / NOT INSTALLED / NOT CONFIGURED} (expected: {stack pre_commit})

11. Documentation Freshness

Check that project documentation is up-to-date with the code.

Required files check:

ls -la CLAUDE.md README.md docs/prd.md docs/workflow.md 2>&1

CLAUDE.md:

Does it reflect current tech stack, commands, directory structure?
Are recently added features/endpoints documented?

Grep for outdated references (old package names, removed files):

# Check that files mentioned in CLAUDE.md actually exist
grep -oP '`[a-zA-Z0-9_./-]+\.(ts|py|swift|kt|md)`' CLAUDE.md | while read f; do [ ! -f "$f" ] && echo "MISSING: $f"; done

README.md:

Does it have setup/run/test/deploy instructions?
Are the commands actually runnable?

docs/prd.md:

Do features match what was actually built?
Are metrics and success criteria defined?

AICODE- comments:

grep -rn "AICODE-TODO" src/ app/ lib/ 2>/dev/null | head -10
grep -rn "AICODE-ASK" src/ app/ lib/ 2>/dev/null | head -10

Flag unresolved AICODE-TODO items that were completed but not cleaned up
Flag unanswered AICODE-ASK questions
Check for AICODE-NOTE on complex/non-obvious logic

Dead code check:

Unused imports (linter should catch, but verify)
Orphaned files not imported anywhere
If knip available (Next.js): pnpm knip 2>&1 | head -30

Report:

CLAUDE.md: CURRENT / STALE / MISSING
README.md: CURRENT / STALE / MISSING
docs/prd.md: CURRENT / STALE / MISSING
docs/workflow.md: CURRENT / STALE / MISSING
AICODE-TODO unresolved: {N}
AICODE-ASK unanswered: {N}
Dead code: {files/exports found}

12. Context Quality Check

Verify that the project is set up for efficient future agent sessions:

Plan handoff quality:

Does plan.md have a ## Context Handoff section with Intent/Key Files/Decisions/Risks?
Are all [x] tasks annotated with ?
Could a new agent pick up this plan without re-reading the whole codebase?

CLAUDE.md signal-to-noise:

Size check: wc -c CLAUDE.md — warn if >40,000 chars (attention dilution)
Are Do/Don't rules actionable and specific?
Are file paths still valid? (quick grep for referenced files that don't exist)

Scratch files cleanup:

If scratch/ directory exists, check if it has stale files from build phase
Recommend cleanup if >10 files or >1MB total

Report:

Plan handoff: {GOOD / INCOMPLETE / MISSING}
CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
Scratch cleanup: {CLEAN / NEEDS CLEANUP / N/A}
Status: {PASS / WARN}

13. Pre-mortem (Launch Risk Assessment)

Renumbered: was #12 before Context Quality Check was inserted.

If the project is about to ship (verdict heading toward SHIP), run a quick pre-mortem:

Imagine it's 3 months after launch and the product failed. What went wrong?

Classify risks using Tigers/Paper Tigers/Elephants:

Risk	Type	Likelihood	Impact	Mitigation
...	Tiger (real + scary)	High	High	...
...	Paper Tiger (scary but unlikely)	Low	High	Accept
...	Elephant (obvious but ignored)	High	Medium	...

Focus on:

Data loss scenarios — what if the local DB corrupts? Is there export/backup? (offline-first principle from templates/principles/manifest.md)
Privacy incidents — what data leaves the device? Could a breach expose users?
Single point of failure — one API key, one server, one payment provider?
Market timing — is a bigger player about to launch this? (check research.md competitors)

Include in review report under "### Pre-mortem" section. If any Tiger risk has no mitigation, add it as a FIX FIRST task.

14. Visual/E2E Testing

Use references/qa-issue-taxonomy.md for severity classification and per-page exploration checklist.

If browser tools or device tools are available, run a visual smoke test.

Web projects (Playwright MCP or browser tools):

Start dev server (use dev_server.command from stack YAML, e.g. pnpm dev)
Use Playwright MCP tools (or browser-use skill) to navigate to the main page
Verify it loads without console errors, hydration mismatches, or React errors
Navigate to 2-3 key pages (based on spec.md features)
Take screenshots at desktop (1280px) and mobile (375px) viewports
Look for broken images, missing styles, layout overflow

iOS projects (simulator):

Build for simulator: xcodebuild -scheme {Name} -sdk iphonesimulator build
Install and launch on booted simulator
Take screenshot of main screen
Check simulator logs for crashes or assertion failures

Android projects (emulator):

Build debug APK: ./gradlew assembleDebug
Install and launch on emulator
Take screenshot of main activity
Check logcat for crashes or ANRs: adb logcat '*:E' --format=time -d 2>&1 | tail -20

If tools are not available: skip this dimension, note as "N/A — no browser/device tools" in the report. Visual testing is never a blocker for SHIP verdict on its own.

Report:

Platform tested: {browser / simulator / emulator / N/A}
Pages/screens checked: {N}
Console errors: {N}
Visual issues: {NONE / list}
Responsive: {PASS / issues found}
Status: {PASS / WARN / FAIL / N/A}

Review Report

Generate the final report:

Code Review: {project-name}
Date: {YYYY-MM-DD}

## Verdict: {SHIP / FIX FIRST / BLOCK}

### Summary
{1-2 sentence overall assessment}

### Tests
- Total: {N} | Pass: {N} | Fail: {N} | Skip: {N}
- Coverage: {N}%
- Status: {PASS / FAIL}

### Linter
- Errors: {N} | Warnings: {N}
- Status: {PASS / WARN / FAIL}

### Build
- Status: {PASS / FAIL}
- Warnings: {N}

### Security
- Vulnerabilities: {N} (critical: {N}, high: {N}, moderate: {N})
- Hardcoded secrets: {NONE / FOUND}
- Status: {PASS / WARN / FAIL}

### Acceptance Criteria
- Verified: {N}/{M}
- Missing: {list}
- Status: {PASS / PARTIAL / FAIL}

### Plan Progress
- Tasks: {N}/{M} complete
- Phases: {N}/{M} complete
- Status: {COMPLETE / IN PROGRESS}

### Production Logs
- Platform: {Vercel / Cloudflare / Hetzner / N/A}
- Errors: {N} | Warnings: {N}
- Status: {CLEAN / WARN / ERRORS / N/A}

### Dev Principles
- SOLID: {PASS / violations found}
- Schemas-first: {YES / raw data found}
- Error handling: {PASS / issues found}
- Status: {PASS / WARN / FAIL}

### Commits
- Total: {N} | Conventional: {N}/{M}
- Atomic: {YES / NO}
- Plan SHAs: {N}/{M}
- Status: {PASS / WARN / FAIL}

### Documentation
- CLAUDE.md: {CURRENT / STALE / MISSING}
- README.md: {CURRENT / STALE / MISSING}
- AICODE-TODO unresolved: {N}
- Dead code: {NONE / found}
- Status: {PASS / WARN / FAIL}

### Context Quality
- Plan handoff: {GOOD / INCOMPLETE / MISSING}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
- Scratch cleanup: {CLEAN / NEEDS CLEANUP / N/A}
- Status: {PASS / WARN}

### Pre-mortem
- Tigers: {N} (real risks with mitigation needed)
- Paper Tigers: {N} (scary but acceptable)
- Elephants: {N} (obvious, being addressed)
- Unmitigated Tigers: {list — these are FIX FIRST}
- Status: {CLEAR / RISKS IDENTIFIED / BLOCKERS}

### Visual Testing
- Platform: {browser / simulator / emulator / N/A}
- Pages/screens: {N}
- Console errors: {N}
- Visual issues: {NONE / list}
- Status: {PASS / WARN / FAIL / N/A}

### Issues Found
1. [{severity}] {description} — {file:line}
2. [{severity}] {description} — {file:line}

### Recommendations
- {actionable recommendation}
- {actionable recommendation}

Verdict logic:

SHIP: All tests pass, no security issues, ALL acceptance criteria verified (not PARTIAL), build succeeds, production logs clean, docs current, commits atomic, no critical visual issues, no unmitigated Tiger risks
FIX FIRST: Minor issues, PARTIAL acceptance criteria (any criterion FAILED or unverified), warnings, low-severity vulns, intermittent log errors, stale docs, non-conventional commits, minor SOLID violations, minor visual issues, Tiger risks with feasible mitigations — list what to fix. PARTIAL acceptance = always FIX FIRST, never SHIP.
BLOCK: Failing tests, security vulnerabilities, missing critical features, production crashes in logs, missing CLAUDE.md/README.md, critical architecture violations, app crashes on launch, unmitigated Tiger risks with high impact — do not ship

Post-Verdict: CLAUDE.md Revision

After the verdict report, revise the project's CLAUDE.md to keep it lean and useful for future agents.

Steps:

Read CLAUDE.md and check size: wc -c CLAUDE.md
Add learnings from this review:
- New Do/Don't rules discovered during review
- Updated commands, workflows, or architecture decisions
- Fixed issues or gotchas worth remembering
- Stack/dependency changes (new packages, removed deps)
If over 40,000 characters — trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations — keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
Verify result ≤ 40,000 characters — if still over, cut least actionable content
Write updated CLAUDE.md, update "Last updated" date

Priority (keep → cut):

ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
KEEP: Workflow instructions, troubleshooting for active issues, key file references
CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues

Rules:

Never remove Do/Don't sections — critical guardrails
Preserve overall section structure and ordering
Every line must earn its place: "would a future agent need this to do their job?"
Commit the update: git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-review)"

AFTER CLAUDE.md revision — output signal EXACTLY ONCE:

Output pipeline signal ONLY if pipeline state directory (.solo/states/) exists.

Output the signal tag ONCE and ONLY ONCE. Do not repeat it. The pipeline detects the first occurrence.

If SHIP: output this exact line (once):

<solo:done/>

If FIX FIRST or BLOCK:

Open plan.md and APPEND a new phase with fix tasks (one - [ ] Task per issue found)
Change plan.md status from [x] Complete to [~] In Progress
Commit: git add docs/plan/ && git commit -m "fix: add review fix tasks"
Output this exact line (once):

<solo:redo/>

Error Handling

Tests won't run

Cause: Missing dependencies or test config. Fix: Run npm install / uv sync, check test config exists (jest.config, pytest.ini).

Linter not configured

Cause: No linter config file found. Fix: Note as a recommendation in the report, not a blocker.

Build fails

Cause: Type errors, import issues, missing env vars. Fix: Report specific errors. This is a BLOCK verdict — must fix before shipping.

Two-Stage Review Pattern

When reviewing significant work, use two stages:

Stage 1 — Spec Compliance:

Does the implementation match spec.md requirements?
Are all acceptance criteria actually met (not just claimed)?
Any deviations from the plan? If so, are they justified improvements or problems?

Stage 2 — Code Quality:

Architecture patterns, error handling, type safety
Test coverage and test quality
Security and performance
Code organization and maintainability

Verification Gate

No verdict without fresh evidence.

Before writing any verdict (SHIP/FIX/BLOCK):

Run the actual test/build/lint commands (not cached results).
Read full output — exit codes, pass/fail counts, error messages.
Confirm the output matches your claim.
Only then write the verdict with evidence.

Never write "tests should pass" — run them and show the output.

Rationalizations Catalog

Thought	Reality
"Tests were passing earlier"	Run them NOW. Code changed since then.
"It's just a warning"	Warnings become bugs. Report them.
"The build worked locally"	Check the platform too. Environment differences matter.
"Security scan is overkill"	One missed secret = data breach. Always scan.
"Good enough to ship"	Quantify "good enough". Show the numbers.
"I already checked this"	Fresh evidence only. Stale checks are worthless.

Red Flags — STOP immediately if you catch yourself thinking:

"I already know this code is fine" — Run the command. Fresh evidence only.
"The tests passed earlier, skip re-running" — Code changed. Run them NOW.
"This acceptance criterion is probably met" — Probably is not verified. Show evidence or mark FAILED.
"SHIP with minor issues" — Quantify "minor". If you can't, it's FIX FIRST.
"Security scan seems excessive for this project" — One missed secret = data breach. Always scan.
"I'll note this as a recommendation instead of blocking" — If it's a real risk, block. Recommendations get ignored.

Foundational principle: A review without fresh evidence is not a review — it's a guess wearing a lab coat. Run every command. Read every output. No shortcuts.

Critical Rules

Run all checks — do not skip dimensions even if project seems simple.
Be specific — always include file:line references for issues.
Verdict must be justified — every SHIP/FIX/BLOCK needs evidence from actual commands.
Don't auto-fix code — report issues and add fix tasks to plan.md. Let /build fix them. Review only modifies plan.md, never source code.
Check acceptance criteria — spec.md is the source of truth for "done".
Security is non-negotiable — any hardcoded secret = BLOCK.
Fresh evidence only — run commands before making claims. Never rely on memory.

solo-review

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

solo-review

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

/review

Live Context

When to use

MCP Tools (use if available)

Pre-flight Checks

1. Architecture overview (if MCP available)

2. Essential docs (parallel reads)

3. Detect stack

4. Smart source code loading (for code quality spot check)

Review Dimensions

1. Test Suite

2. Linter & Type Check

3. Build Verification

4. Pre-Landing Checklist (Two-Pass)

5. Security Audit

6. Acceptance Criteria Verification

6. Code Quality Spot Check

7. Plan Completion Check

8. Production Logs (if deployed)

9. Dev Principles Compliance

10. Commit Quality

11. Documentation Freshness

12. Context Quality Check

13. Pre-mortem (Launch Risk Assessment)

14. Visual/E2E Testing

Review Report

Post-Verdict: CLAUDE.md Revision

Steps:

Priority (keep → cut):

Rules:

AFTER CLAUDE.md revision — output signal EXACTLY ONCE:

Error Handling

Tests won't run

Linter not configured

Build fails

Two-Stage Review Pattern

Verification Gate

Rationalizations Catalog

Red Flags — STOP immediately if you catch yourself thinking:

Critical Rules

Similar Skills

Help us improve

/review

Live Context

When to use

MCP Tools (use if available)

Pre-flight Checks

1. Architecture overview (if MCP available)

2. Essential docs (parallel reads)

3. Detect stack

4. Smart source code loading (for code quality spot check)

Review Dimensions

1. Test Suite

2. Linter & Type Check

3. Build Verification

4. Pre-Landing Checklist (Two-Pass)

5. Security Audit

6. Acceptance Criteria Verification

6. Code Quality Spot Check

7. Plan Completion Check

8. Production Logs (if deployed)

9. Dev Principles Compliance