From oss
Multi-agent code review of GitHub Pull Requests covering architecture, tests, performance, docs, lint, security, and API design.
npx claudepluginhub borda/ai-rig --plugin ossThis skill is limited to using the following tools:
<objective>
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Analyzes competition with Porter's Five Forces, Blue Ocean Strategy, and positioning maps to identify differentiation opportunities and market positioning for startups and pitches.
Perform a comprehensive code review by spawning specialized sub-agents in parallel and consolidating their findings into structured feedback with severity levels.
42): review the PR diff--reply: after review, spawn oss:shepherd to draft a contributor-facing PR comment from the findings. When the argument is a path ending in .md, spawns oss:shepherd directly from that report without running a new review./develop:review to review local files or the current git diff without a GitHub PR.Foundry plugin check: run
ls ~/.claude/plugins/cache/ 2>/dev/null | grep -q foundry(exit 0 = installed). If the check fails or you are uncertain, proceed as if foundry is available — it is the common case; only fall back if an agent dispatch explicitly fails.
When foundry is not installed, substitute foundry:X references with general-purpose and prepend the role description plus model: <model> to the spawn call:
| foundry agent | Fallback | Model | Role description prefix |
|---|---|---|---|
foundry:sw-engineer | general-purpose | opus | You are a senior Python software engineer. Write production-quality, type-safe code following SOLID principles. |
foundry:qa-specialist | general-purpose | opus | You are a QA specialist. Write deterministic, parametrized pytest tests covering edge cases and regressions. |
foundry:perf-optimizer | general-purpose | opus | You are a performance engineer. Profile before changing. Focus on CPU/GPU/memory/IO bottlenecks in Python/ML workloads. |
foundry:doc-scribe | general-purpose | sonnet | You are a documentation specialist. Write Google-style docstrings and keep README content accurate and concise. |
foundry:linting-expert | general-purpose | haiku | You are a static analysis specialist. Fix ruff/mypy violations, add missing type annotations, configure pre-commit hooks. |
foundry:solution-architect | general-purpose | opus | You are a system design specialist. Produce ADRs, interface specs, and API contracts — read code, produce specs only. |
Skills with --team mode: team spawning with fallback agents still works but produces lower-quality output.
Task hygiene: Before creating tasks, call TaskList. For each found task:
completed if the work is clearly donedeleted if orphaned / no longer relevantin_progress only if genuinely continuingTask tracking: per CLAUDE.md, create tasks (TaskCreate) for each major phase. Mark in_progress/completed throughout. On loop retry or scope change, create a new task.
# Parse --reply flag — must run before any gh calls
REPLY_MODE=false
CLEAN_ARGS=$ARGUMENTS
if echo "$ARGUMENTS" | grep -q -- '--reply'; then
REPLY_MODE=true
CLEAN_ARGS=$(echo "$ARGUMENTS" | sed 's/--reply//g' | xargs)
fi
DIRECT_PATH_MODE=false
if echo "$CLEAN_ARGS" | grep -qE '\.md$'; then
DIRECT_PATH_MODE=true
REVIEW_FILE="$CLEAN_ARGS"
fi
# $CLEAN_ARGS must be a PR number — run all four in parallel:
gh pr diff $CLEAN_ARGS --name-only # files changed in PR # timeout: 6000
gh pr view $CLEAN_ARGS # PR description and metadata # timeout: 6000
gh pr checks $CLEAN_ARGS # CI status — don't review if CI is red # timeout: 15000
gh pr view $CLEAN_ARGS --json reviews,labels,milestone # timeout: 6000
If Continuous Integration (CI) is red, report that without full review.
Before spawning agents, classify the diff:
Use classification to skip optional agents:
PROJ=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null) || PROJ=$(basename "$PWD")
if command -v scan-query >/dev/null 2>&1 && [ -f ".cache/scan/${PROJ}.json" ]; then
# Use file list from the gh pr diff already fetched above
CHANGED_MODS=$(gh pr diff $CLEAN_ARGS --name-only 2>/dev/null | grep '\.py$' | sed 's|^src/||;s|\.py$||;s|/|.|g' | grep -v '__init__$') # timeout: 6000
scan-query central --top 5 2>/dev/null # timeout: 5000
for mod in $CHANGED_MODS; do scan-query rdeps "$mod" 2>/dev/null; done # timeout: 5000
fi
If codemap returns results: prepend a ## Structural Context (codemap) block to the Agent 1 (foundry:sw-engineer) spawn prompt. Include:
rdep_count — label as high risk (>20), moderate (5–20), or low (<5)central --top 5 for project-wide blast-radius referenceAgent 1 uses this to prioritize: modules with high rdep_count warrant deeper scrutiny on API compatibility, error handling, and behavioural correctness — downstream callers outside the diff are not otherwise visible to the reviewer. If codemap is not installed or index absent, skip silently.
Parse the PR body (from gh pr view $CLEAN_ARGS) for issue references (Closes #N, Fixes #N, Resolves #N, refs #N — case-insensitive). Extract all referenced issue numbers into ISSUE_NUMS (list). Cap at 3 issues maximum.
If ISSUE_NUMS is non-empty, spawn one foundry:sw-engineer agent per issue at the start of Step 2 (in parallel with Codex co-review — both are independent of each other). Each issue agent should:
gh issue view <N> --json title,body,comments,state,labelsgh issue view <N> --comments/oss:analyse-style output: Summary, Root Cause Hypotheses table (top 3), Code Evidence for top hypothesis$RUN_DIR/issue-<N>.md (file-handoff protocol){"status":"done","issue":N,"root_cause":"<one-line summary>","file":"$RUN_DIR/issue-<N>.md","confidence":0.N}If ISSUE_NUMS is empty, skip all issue-related checks in downstream steps.
If DIRECT_PATH_MODE=true:
REPLY_MODE=false → print Error: --reply is required when passing a .md report path and stop.REPLY_MODE=true and [ ! -f "$REVIEW_FILE" ] → print Error: report not found: $REVIEW_FILE and stop.REPLY_MODE=true and file exists → print [direct] using $REVIEW_FILE → skip immediately to Step 9. Do not run Steps 2–8.Set up the run directory (shared by Codex and all agent spawns in Step 3):
TIMESTAMP=$(date -u +%Y-%m-%dT%H-%M-%SZ)
RUN_DIR=".reports/review/$TIMESTAMP"
mkdir -p "$RUN_DIR" # timeout: 5000
Check availability:
claude plugin list 2>/dev/null | grep -q 'codex@openai-codex' && echo "codex (openai-codex) available" || echo "⚠ codex (openai-codex) not found — skipping co-review" # timeout: 15000
If Codex is available, run a comprehensive review on the diff:
CODEX_OUT="$RUN_DIR/codex.md"
Agent(subagent_type="codex:codex-rescue", prompt="Adversarial review: look for bugs, missed edge cases, incorrect logic, and inconsistencies with existing code patterns. Read-only: do not apply fixes. Write findings to $RUN_DIR/codex.md.")
After Codex writes $RUN_DIR/codex.md, extract a compact seed list (≤10 items, [{"loc":"file:line","note":"..."}]) to inject into agent prompts in Step 3 as pre-flagged issues to verify or dismiss. If Codex was skipped or found nothing, proceed with an empty seed.
File-based handoff: read .claude/skills/_shared/file-handoff-protocol.md (if file not found, skip). The run directory was created in Step 2 ($RUN_DIR).
Replace $RUN_DIR in the spawn prompt below with the actual path from Step 2.
Launch agents simultaneously with the Agent tool (security augmentation is folded into Agent 1 — not a separate spawn; Agent 6 is optional). Every agent prompt must end with:
"Write your FULL findings (all sections, Confidence block) to
$RUN_DIR/<agent-name>.mdusing the Write tool — where<agent-name>is e.g.foundry:sw-engineer,foundry:qa-specialist,foundry:perf-optimizer,foundry:doc-scribe,foundry:linting-expert,foundry:solution-architect. Then return to the caller ONLY a compact JSON envelope on your final line — nothing else after it:{\"status\":\"done\",\"findings\":N,\"severity\":{\"critical\":0,\"high\":1,\"medium\":2},\"file\":\"$RUN_DIR/<agent-name>.md\",\"confidence\":0.88}"
Agent 1 — foundry:sw-engineer: Review architecture, SOLID adherence, type safety, error handling, and code structure. Check for Python anti-patterns (bare except:, import *, mutable defaults). Flag blocking issues vs suggestions.
Error path analysis (for new/changed code in the diff): For each error-handling path introduced or modified, produce a table:
| Location | Exception/Error | Caught? | Action if caught | User-visible? |
|---|
Flag rules:
pass or bare except → MEDIUM (swallowed error)Read the review checklist (use the Read tool to read plugins/oss/skills/review/checklist.md) — apply CRITICAL/HIGH patterns as severity anchors. Respect the suppressions list.
If ISSUE_NUMS is non-empty, linked issue analysis files exist at $RUN_DIR/issue-*.md. Read them. Evaluate whether the code changes address the root cause identified in each linked issue — not just the symptom or the PR description. If the PR addresses only a symptom while the root cause remains unfixed, flag as [blocking] HIGH — root cause misalignment. If the PR description diverges from the issue's stated problem (solving something different than what was reported), flag as HIGH — PR/issue scope divergence.
Agent 2 — foundry:qa-specialist: Audit test coverage. Identify untested code paths, missing edge cases, and test quality issues. Check for Machine Learning (ML)-specific issues (non-deterministic tests, missing seed pinning). List the top 5 tests that should be added. Also check explicitly for missing tests in these patterns (these are Ground Truth (GT)-level findings, not afterthoughts):
log() before start())Consolidation rule: Report each test gap as one finding with a concise list of test scenarios, not as separate findings per scenario. Format: "Missing tests for parse_numeric(): empty string, None, very large integers, float-string for int parser." This keeps the test coverage section actionable and prevents the section from exceeding 5 items.
If ISSUE_NUMS is non-empty, linked issue analysis files exist at $RUN_DIR/issue-*.md. Read them. Check that tests cover the specific reproduction scenario described in the linked issue. If the issue includes a minimal reproduction or error trace that is not covered by new or existing tests, flag as HIGH — issue reproduction not tested.
Agent 3 — foundry:perf-optimizer: Analyze code for performance issues. Look for algorithmic complexity issues, Python loops that should be NumPy/torch ops, repeated computation, unnecessary Input/Output (I/O). For ML code: check DataLoader config, mixed precision usage. Prioritize by impact.
Agent 4 — foundry:doc-scribe: Check documentation completeness. Find public APIs without docstrings, missing Google style sections, outdated README sections, and CHANGELOG gaps. Verify examples actually run.
datetime.utcnow() deprecated in 3.12, os.path vs pathlib). Flag deprecated stdlib usage as MEDIUM with the replacement. This is a frequent omission in general review but reliably caught by doc-scribe with this explicit trigger.Agent 5 — foundry:linting-expert: Static analysis audit. Check ruff and mypy would pass. Identify type annotation gaps on public APIs, suppressed violations without explanation, and any missing pre-commit hooks. Flag mismatched target Python version.
Security augmentation (conditional — fold into Agent 1 prompt, not a separate spawn): If the diff touches authentication, user input handling, dependency updates, or serialization — add to the foundry:sw-engineer agent prompt (Agent 1 above): check for Structured Query Language (SQL) injection, Cross-Site Scripting (XSS), insecure deserialization, hardcoded secrets, and missing input validation. Run pip-audit if dependency files changed. Skip if the PR is purely internal refactoring.
Agent 6 — foundry:solution-architect (optional, for PRs touching public API boundaries): If the diff touches __init__.py exports, adds/modifies Protocols or Abstract Base Classes (ABCs), changes module structure, or introduces new public classes — evaluate API design quality, coupling impact, and backward compatibility. Skip if changes are internal implementation only.
Health monitoring (CLAUDE.md §8): Agent calls are synchronous — Claude awaits each response natively; no Bash checkpoint polling is available. If any agent does not return within $HARD_CUTOFF seconds, use the Read tool to surface any partial results already written to $RUN_DIR and continue with what was found; mark timed-out agents with ⏱ in the final report. Grant one $EXTENSION extension if the output file tail explains the delay. Never silently omit timed-out agents.
While agents from Step 3 are completing, run these two independent checks simultaneously:
TRUNK=$(git remote show origin 2>/dev/null | grep 'HEAD branch' | awk '{print $NF}') # timeout: 6000 # shared by 4a and 4b
# Check if changed APIs are used by downstream projects
# Rate-limit guard: if gh api returns HTTP 429, wait 10 seconds and retry once.
# If still rate-limited, log "rate-limited — downstream search may be incomplete" and continue.
# --paginate is available for large result sets but increases rate-limit exposure; omit unless completeness is critical.
CHANGED_EXPORTS=$(git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD -- "src/**/__init__.py" | grep "^[-+]" | grep -v "^[-+][-+]" | grep -oP '\w+' | sort -u) # timeout: 3000
for export in $CHANGED_EXPORTS; do
echo "=== $export ==="
gh api "search/code" --field "q=$export language:python" --jq '.items[:5] | .[].repository.full_name' 2>/dev/null # timeout: 30000
# Note: GitHub code search API is rate-limited (~30 req/min); empty results may indicate rate limiting, not absence of usage
done
# Check if deprecated APIs have migration guides
git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD | grep -A2 "deprecated" # timeout: 3000
# Check for new dependencies — license compatibility
git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD -- pyproject.toml requirements*.txt # timeout: 3000
# Check for secrets accidentally committed
git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD | grep -iE "(password|secret|api_key|token)\s*=\s*['\"][^'\"]{8,}" # timeout: 3000
# Check for API stability: are public APIs being removed without deprecation?
git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD -- "src/**/__init__.py" # timeout: 3000
# Check CHANGELOG was updated
git diff $(git merge-base HEAD origin/${TRUNK:-main}) HEAD -- CHANGELOG.md CHANGES.md # timeout: 3000
Read and follow the cross-validation protocol from .claude/skills/_shared/cross-validation-protocol.md. If .claude/skills/_shared/cross-validation-protocol.md is not present, skip Step 5.
Skill-specific: use the same agent type that raised the finding as the verifier (e.g., foundry:sw-engineer verifies foundry:sw-engineer's critical finding).
Before constructing the output path, extract the current branch and date components: BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo 'main') YYYY=$(date +%Y); MM=$(date +%m); DATE=$(date +%Y-%m-%d)
Spawn a foundry:sw-engineer consolidator agent with this prompt:
"Read all finding files in
$RUN_DIR/(agent files:sw-engineer.md,qa-specialist.md,perf-optimizer.md,doc-scribe.md,linting-expert.md,solution-architect.md, andcodex.mdif present — skip any that are missing). Readplugins/oss/skills/review/checklist.mdusing the Read tool and apply the consolidation rules (signal-to-noise filter, annotation completeness, section caps). Apply the precision gate: only include findings with a concrete, actionable location (function, line range, or variable name). Apply the finding density rule: for modules under 100 lines, aim for ≤10 total findings. Rank findings within each section by impact (blocking > critical > high > medium > low). Forcodex.md: include its unique findings under a### Codex Co-Reviewsection; deduplicate against agent findings (same file:line raised by both → keep the agent version, mark as 'also flagged by Codex'). Ifissue-*.mdfiles exist in$RUN_DIR, include a### Issue Root Cause Alignmentsection placed immediately after### [blocking] Critical. For each linked issue: state the root cause hypothesis, whether the PR addresses it (yes / partially / no), whether the PR description diverges from the issue's stated problem, and whether the reproduction scenario is tested. Anyroot cause misalignmentorscope divergencefinding is at least HIGH severity. Parse each agent'sconfidencefrom its envelope; assigncodexa fixed confidence of 0.75 (moderate — static analysis, no runtime context). Write the consolidated report to.temp/output-review-$BRANCH-$DATE.mdusing the Write tool. Return ONLY a one-line summary:verdict=<APPROVE|REQUEST_CHANGES|NEEDS_WORK> | findings=N | critical=N | high=N | file=.temp/output-review-$BRANCH-$DATE.md"
Main context receives only the one-liner verdict. Proceed with that summary for terminal output.
## Code Review: [target]
### [blocking] Critical (must fix before merge)
- [bugs, security issues, data corruption risks]
- Severity: CRITICAL / HIGH
### Issue Root Cause Alignment
(omit if no linked issues)
- Issue #N: [title] — [root cause hypothesis from analysis]
- Root cause addressed: [yes / partially / no — explanation]
- PR/issue scope alignment: [aligned / diverged — what differs]
- Reproduction tested: [yes / no — what's missing]
### Architecture & Quality
- [sw-engineer findings]
- [blocking] issues marked explicitly
- [nit] suggestions marked explicitly
### Test Coverage Gaps
- [qa-specialist findings — top 5 missing tests]
- For ML code: non-determinism or missing seed issues
### Performance Concerns
- [perf-optimizer findings — ranked by impact]
- Include: current behavior vs expected improvement
### Documentation Gaps
- [doc-scribe findings]
- Public API without docstrings listed explicitly
### Static Analysis
- [linting-expert findings — ruff violations, mypy errors, annotation gaps]
### API Design (if applicable)
- [solution-architect findings — coupling, API surface, backward compat]
- Public API changes: [intentional / accidental leak]
- Deprecation path: [provided / missing]
### OSS Checks
- New dependencies: [list, license status]
- API stability: [any public API removed without deprecation?]
- CHANGELOG: [updated / not updated]
- Secrets scan: [clean / found: file:line]
### Codex Co-Review
(omit section if Codex was unavailable or found no unique issues)
- [unique findings from codex.md not already captured by agents above]
- Duplicate findings (same location as agent finding): omitted — see agent section
### Recommended Next Steps
1. [most important action]
2. [second most important]
3. [third]
### Review Confidence
| Agent | Score | Label | Gaps |
|-------|-------|-------|------|
**Aggregate**: min 0.65 / median 0.N
[⚠ LOW CONFIDENCE: qa-specialist could not verify test execution — treat coverage findings as indicative, not conclusive]
After parsing confidence scores: if any agent scored < 0.7, prepend ⚠ LOW CONFIDENCE to that agent's findings section and explicitly state the gap. Do not silently drop uncertain findings — flag them so the reviewer can decide whether to investigate further.
Read the compact terminal summary template from .claude/skills/_shared/terminal-summaries.md — use the PR Summary template with the Extended Fields (review only) addendum. Replace [entity-line] with Review — [target] and replace [skill-specific path] with .temp/output-review-$BRANCH-$DATE.md. The rendered terminal block must follow this exact structure: opening --- on its own line, followed by the entity line on the next line (never concatenated as ---Review...); the → saved to .temp/output-review-$BRANCH-$DATE.md line must be present after Confidence:; closing --- must follow the → saved to line. Print this block to the terminal.
After printing to the terminal, also prepend the same compact block to the top of the report file using the Edit tool — insert it at line 1 so the file begins with the compact summary followed by a blank line, then the existing ## Code Review: [target] content.
After consolidating findings, identify tasks from the review that Codex can implement directly — not style violations (those are handled by pre-commit hooks), but work that requires writing meaningful code or documentation grounded in the actual implementation.
Delegate to Codex when you can write an accurate, specific brief:
Do not delegate — these require human judgment:
Read .claude/skills/_shared/codex-delegation.md and apply the delegation criteria defined there (if file not found, skip Step 7 delegation entirely).
Example prompt: "Add a test for StreamReader.read_chunk() in tests/test_reader.py — the method should raise ValueError when called after close(), currently no test covers this path."
Only print a ### Codex Delegation section to the terminal when tasks were actually delegated — omit entirely if nothing was delegated. (do not re-write the output file).
Run this step before the Confidence block regardless of --reply mode.
If REPLY_MODE=true: your response is incomplete until you have executed Step 9 below and written the reply file. Do not add a Confidence block or end your response here — proceed to Step 9 immediately.
If REPLY_MODE=false: skip Step 9 and end with the Confidence block now.
If REPLY_MODE is not set, skip this step.
Pre-compute the reply date: SPAWN_DATE="$(date -u +%Y-%m-%d)"
Spawn the oss:shepherd agent with:
<path>. Write a two-part contributor reply: Part 1 — Reply summary (always present, always complete on its own): (a) acknowledgement + praise naming what is genuinely good — technique, structural decisions, test quality — 1–3 concrete observations, not generic; (b) thematic areas needing improvement — no counts, no itemisation, no 'see below'; name the concern areas concretely enough that the contributor knows what to look at without Part 2; (c) optional closing sentence only when Part 2 follows (e.g. 'I've left inline suggestions with specifics.'). Part 2 — Inline suggestions (optional; single unified table, all findings in one place — no separate prose paragraphs): | Importance | Confidence | File | Line | Comment | — Importance and Confidence as the two leftmost columns; high → medium → low, then most confident first within tier; 1–2 sentences per row for high items; include all high/medium/low findings in one table. No column-width line-wrapping in prose. Write your full output to .temp/output-reply-<PR#>-$SPAWN_DATE.md using the Write tool. Return ONLY a one-line summary: part1=done | part2=N_rows | → .temp/output-reply-<PR#>-<date>.md"Print compact terminal summary:
Part 1 — reply summary (complete standalone)
Part 2 — N inline suggestions
Reply: .temp/output-reply-<PR#>-<date>.md
End your response with a ## Confidence block per CLAUDE.md output standards. For static analysis of complete, self-contained code (no missing imports needed to reason about the findings), a baseline confidence of 0.88+ is appropriate; reserve scores below 0.80 for cases where runtime behaviour, external dependencies, or execution traces are genuinely needed to validate a finding. This is always the very last thing, whether or not --reply was used.
[nit] in a dedicated "Minor Observations" section rather than elevating them to the same tier as high-severity findings. The goal is that the first 3 findings a reader sees are always the most impactful.[blocking] prefix so author knows what must change[blocking] bugs or regressions → /develop:fix to reproduce with test and apply targeted fix/develop:refactor for test-first improvementspip-audit for dependency Common Vulnerabilities and Exposures (CVEs); address Open Web Application Security Project (OWASP) issues inline via /develop:fix/codex:codex-rescue <task> to delegate additional tasks/codex:codex-rescue <task description> per finding to delegate to Codex--reply to auto-draft via oss:shepherd; or invoke oss:shepherd manually for custom framing