From develop
Multi-agent code review of local files, directories, or the current git diff covering architecture, tests, performance, docs, lint, security, and API design.
npx claudepluginhub borda/ai-rig --plugin developThis skill is limited to using the following tools:
<objective>
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Perform a comprehensive code review of local files or the current working-tree diff by spawning specialized sub-agents in parallel and consolidating their findings into structured feedback with severity levels.
git diff HEAD — staged + unstaged changes vs HEAD)Foundry plugin check: run
ls ~/.claude/plugins/cache/ 2>/dev/null | grep -q foundry(exit 0 = installed). If the check fails or you are uncertain, proceed as if foundry is available — it is the common case; only fall back if an agent dispatch explicitly fails.
When foundry is not installed, substitute foundry:X references with general-purpose and prepend the role description plus model: <model> to the spawn call:
| foundry agent | Fallback | Model | Role description prefix |
|---|---|---|---|
foundry:sw-engineer | general-purpose | opus | You are a senior Python software engineer. Write production-quality, type-safe code following SOLID principles. |
foundry:qa-specialist | general-purpose | opus | You are a QA specialist. Write deterministic, parametrized pytest tests covering edge cases and regressions. |
foundry:perf-optimizer | general-purpose | opus | You are a performance engineer. Profile before changing. Focus on CPU/GPU/memory/IO bottlenecks in Python/ML workloads. |
foundry:doc-scribe | general-purpose | sonnet | You are a documentation specialist. Write Google-style docstrings and keep README content accurate and concise. |
foundry:linting-expert | general-purpose | haiku | You are a static analysis specialist. Fix ruff/mypy violations, add missing type annotations, configure pre-commit hooks. |
foundry:solution-architect | general-purpose | opus | You are a system design specialist. Produce ADRs, interface specs, and API contracts — read code, produce specs only. |
Skills with --team mode: team spawning with fallback agents still works but produces lower-quality output.
Task hygiene: Before creating tasks, call TaskList. For each found task:
completed if the work is clearly donedeleted if orphaned / no longer relevantin_progress only if genuinely continuingTask tracking: per CLAUDE.md, create tasks (TaskCreate) for each major phase. Mark in_progress/completed throughout. On loop retry or scope change, create a new task.
if [ -n "$ARGUMENTS" ]; then
# Path given directly — collect Python files under it
TARGET="$ARGUMENTS"
echo "Reviewing: $TARGET"
else
# No argument — review current working-tree diff vs HEAD
git diff HEAD --name-only # timeout: 3000
fi
Filter to Python files only. If no Python files are found in the target, report "no Python files to review" and stop.
Before spawning agents, classify the diff:
Use classification to skip optional agents:
PROJ=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null) || PROJ=$(basename "$PWD")
if command -v scan-query >/dev/null 2>&1 && [ -f ".cache/scan/${PROJ}.json" ]; then
CHANGED_MODS=$(git diff HEAD --name-only | grep '\.py$' | sed 's|^src/||;s|\.py$||;s|/|.|g' | grep -v '__init__$') # timeout: 3000
scan-query central --top 5 2>/dev/null # timeout: 5000
for mod in $CHANGED_MODS; do scan-query rdeps "$mod" 2>/dev/null; done # timeout: 5000
fi
If codemap returns results: prepend a ## Structural Context (codemap) block to the Agent 1 (foundry:sw-engineer) spawn prompt. Include:
rdep_count — label as high risk (>20), moderate (5–20), or low (<5)central --top 5 for project-wide blast-radius referenceAgent 1 uses this to prioritize: modules with high rdep_count warrant deeper scrutiny on API compatibility, error handling, and behavioural correctness — downstream callers outside the diff are not otherwise visible to the reviewer. If codemap is not installed or index absent, skip silently.
Set up the run directory:
TIMESTAMP=$(date -u +%Y-%m-%dT%H-%M-%SZ)
RUN_DIR=".reports/review/$TIMESTAMP"
mkdir -p "$RUN_DIR" # timeout: 5000
Check availability:
claude plugin list 2>/dev/null | grep -q 'codex@openai-codex' && echo "codex (openai-codex) available" || echo "⚠ codex (openai-codex) not found — skipping co-review" # timeout: 15000
If Codex is available:
CODEX_OUT="$RUN_DIR/codex.md"
Agent(subagent_type="codex:codex-rescue", prompt="Adversarial review of $TARGET: look for bugs, missed edge cases, incorrect logic, and inconsistencies with existing code patterns. Read-only: do not apply fixes. Write findings to $RUN_DIR/codex.md.")
After Codex writes $RUN_DIR/codex.md, extract a compact seed list (≤10 items, [{"loc":"file:line","note":"..."}]) to inject into agent prompts in Step 3 as pre-flagged issues to verify or dismiss. If Codex was skipped or found nothing, proceed with an empty seed.
File-based handoff: read .claude/skills/_shared/file-handoff-protocol.md. The run directory was created in Step 2 ($RUN_DIR).
Replace $RUN_DIR in the spawn prompt below with the actual path from Step 2.
Resolve the oss:review checklist path (version-agnostic):
OSS_ROOT=$(jq -r 'to_entries[] | select(.key | test("oss@")) | .value.installPath' ~/.claude/plugins/installed_plugins.json 2>/dev/null | head -1) # timeout: 5000
REVIEW_CHECKLIST="${OSS_ROOT}/skills/review/checklist.md"
[ -f "$REVIEW_CHECKLIST" ] && echo "Checklist: $REVIEW_CHECKLIST" || echo "⚠ oss:review checklist not found — Agent 1 will skip checklist patterns" # timeout: 5000
Replace $REVIEW_CHECKLIST in the Agent 1 and consolidator spawn prompts below with the resolved path.
Launch agents simultaneously with the Agent tool (security augmentation is folded into Agent 1 — not a separate spawn; Agent 6 is optional). Every agent prompt must end with:
"Write your FULL findings (all sections, Confidence block) to
$RUN_DIR/<agent-name>.mdusing the Write tool — where<agent-name>is e.g.foundry:sw-engineer,foundry:qa-specialist,foundry:perf-optimizer,foundry:doc-scribe,foundry:linting-expert,foundry:solution-architect. Then return to the caller ONLY a compact JSON envelope on your final line — nothing else after it:{\"status\":\"done\",\"findings\":N,\"severity\":{\"critical\":0,\"high\":1,\"medium\":2},\"file\":\"$RUN_DIR/<agent-name>.md\",\"confidence\":0.88}"
Agent 1 — foundry:sw-engineer: Review architecture, SOLID adherence, type safety, error handling, and code structure. Check for Python anti-patterns (bare except:, import *, mutable defaults). Flag blocking issues vs suggestions.
Error path analysis (for new/changed code in the diff): For each error-handling path introduced or modified, produce a table:
| Location | Exception/Error | Caught? | Action if caught | User-visible? |
|---|
Flag rules:
pass or bare except → MEDIUM (swallowed error)Read the review checklist (use the Read tool to read $REVIEW_CHECKLIST) — apply CRITICAL/HIGH patterns as severity anchors. Respect the suppressions list.
Agent 2 — foundry:qa-specialist: Audit test coverage. Identify untested code paths, missing edge cases, and test quality issues. Check for ML-specific issues (non-deterministic tests, missing seed pinning). List the top 5 tests that should be added. Also check explicitly for missing tests in these patterns (these are GT-level findings, not afterthoughts):
log() before start())int(), float(), datetime), test with inputs that are near-valid (float strings for int parsers, empty strings, very large values, None) — these are common omissions.Consolidation rule: Report each test gap as one finding with a concise list of test scenarios, not as separate findings per scenario. Format: "Missing tests for parse_numeric(): empty string, None, very large integers, float-string for int parser." This keeps the test coverage section actionable and prevents the section from exceeding 5 items.
Agent 3 — foundry:perf-optimizer: Analyze code for performance issues. Look for algorithmic complexity issues, Python loops that should be NumPy/torch ops, repeated computation, unnecessary I/O. For ML code: check DataLoader config, mixed precision usage. Prioritize by impact.
Agent 4 — foundry:doc-scribe: Check documentation completeness. Find public APIs without docstrings, missing Google style sections, outdated README sections, and CHANGELOG gaps. Verify examples actually run.
datetime.utcnow() deprecated in 3.12, os.path vs pathlib). Flag deprecated stdlib usage as MEDIUM with the replacement.Agent 5 — foundry:linting-expert: Static analysis audit. Check ruff and mypy would pass. Identify type annotation gaps on public APIs, suppressed violations without explanation, and any missing pre-commit hooks. Flag mismatched target Python version.
Security augmentation (conditional — fold into Agent 1 prompt, not a separate spawn): If the target touches authentication, user input handling, dependency updates, or serialization — add to the foundry:sw-engineer agent prompt (Agent 1 above): check for SQL injection, XSS, insecure deserialization, hardcoded secrets, and missing input validation. Run pip-audit if dependency files changed. Skip if the change is purely internal refactoring.
Agent 6 — foundry:solution-architect (optional, for changes touching public API boundaries): If the target touches __init__.py exports, adds/modifies Protocols or ABCs, changes module structure, or introduces new public classes — evaluate API design quality, coupling impact, and backward compatibility. Skip if changes are internal implementation only.
Health monitoring (CLAUDE.md §8): Agent calls are synchronous — Claude awaits each response natively; no Bash checkpoint polling is available. If any agent does not return within $HARD_CUTOFF seconds, use the Read tool to surface any partial results already written to $RUN_DIR and continue with what was found; mark timed-out agents with ⏱ in the final report. Grant one $EXTENSION extension if the output file tail explains the delay. Never silently omit timed-out agents.
Read and follow the cross-validation protocol from .claude/skills/_shared/cross-validation-protocol.md. If that file is not present, skip Step 4.
Skill-specific: use the same agent type that raised the finding as the verifier (e.g., foundry:sw-engineer verifies foundry:sw-engineer's critical finding).
Before constructing the output path, extract the current branch and date: BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo 'main') DATE=$(date +%Y-%m-%d)
Spawn a foundry:sw-engineer consolidator agent with this prompt:
"Read all finding files in
$RUN_DIR/(agent files:sw-engineer.md,qa-specialist.md,perf-optimizer.md,doc-scribe.md,linting-expert.md,solution-architect.md, andcodex.mdif present — skip any that are missing). Read$REVIEW_CHECKLISTusing the Read tool and apply the consolidation rules (signal-to-noise filter, annotation completeness, section caps). Apply the precision gate: only include findings with a concrete, actionable location (function, line range, or variable name). Apply the finding density rule: for modules under 100 lines, aim for ≤10 total findings. Rank findings within each section by impact (blocking > critical > high > medium > low). Forcodex.md: include its unique findings under a### Codex Co-Reviewsection; deduplicate against agent findings (same file:line raised by both → keep the agent version, mark as 'also flagged by Codex'). Parse each agent'sconfidencefrom its envelope; assigncodexa fixed confidence of 0.75. Write the consolidated report to.temp/output-review-$BRANCH-$DATE.mdusing the Write tool. Return ONLY a one-line summary:verdict=<APPROVE|REQUEST_CHANGES|NEEDS_WORK> | findings=N | critical=N | high=N | file=.temp/output-review-$BRANCH-$DATE.md"
Main context receives only the one-liner verdict.
## Code Review: [target]
### [blocking] Critical (must fix before merge)
- [bugs, security issues, data corruption risks]
- Severity: CRITICAL / HIGH
### Architecture & Quality
- [sw-engineer findings]
- [blocking] issues marked explicitly
- [nit] suggestions marked explicitly
### Test Coverage Gaps
- [qa-specialist findings — top 5 missing tests]
- For ML code: non-determinism or missing seed issues
### Performance Concerns
- [perf-optimizer findings — ranked by impact]
- Include: current behavior vs expected improvement
### Documentation Gaps
- [doc-scribe findings]
- Public API without docstrings listed explicitly
### Static Analysis
- [linting-expert findings — ruff violations, mypy errors, annotation gaps]
### API Design (if applicable)
- [solution-architect findings — coupling, API surface, backward compat]
- Public API changes: [intentional / accidental leak]
- Deprecation path: [provided / missing]
### Codex Co-Review
(omit section if Codex was unavailable or found no unique issues)
- [unique findings from codex.md not already captured by agents above]
- Duplicate findings (same location as agent finding): omitted — see agent section
### Recommended Next Steps
1. [most important action]
2. [second most important]
3. [third]
### Review Confidence
| Agent | Score | Label | Gaps |
|-------|-------|-------|------|
<!-- Replace with actual agent scores for this review -->
**Aggregate**: min 0.N / median 0.N
After parsing confidence scores: if any agent scored < 0.7, prepend ⚠ LOW CONFIDENCE to that agent's findings section and explicitly state the gap. Do not silently drop uncertain findings.
Read the compact terminal summary template from .claude/skills/_shared/terminal-summaries.md — use the PR Summary template. Replace [entity-line] with Review — [target] and replace [skill-specific path] with .temp/output-review-$BRANCH-$DATE.md. Print this block to the terminal.
After printing to the terminal, also prepend the same compact block to the top of the report file using the Edit tool.
After consolidating findings, identify tasks from the review that Codex can implement directly — not style violations (those are handled by pre-commit hooks), but work that requires writing meaningful code or documentation grounded in the actual implementation.
Delegate to Codex when you can write an accurate, specific brief:
Do not delegate — these require human judgment:
Read .claude/skills/_shared/codex-delegation.md and apply the delegation criteria defined there.
Only print a ### Codex Delegation section to the terminal when tasks were actually delegated — omit entirely if nothing was delegated.
End your response with a ## Confidence block per CLAUDE.md output standards.
[nit] in a dedicated "Minor Observations" section rather than elevating them to the same tier as high-severity findings.[blocking] bugs or regressions → /develop:fix to reproduce with test and apply targeted fix/develop:refactor for test-first improvementspip-audit for dependency CVEs; address OWASP issues inline via /develop:fix/codex:codex-rescue <task> to delegate/oss:review <PR#> instead