Search everything...

Skill

staff-code-review

Staff-engineer-level code review that goes beyond correctness to evaluate architectural alignment, system-level implications, failure modes, performance, scalability, backward compatibility, observability, security, and cross-team impact. Use when reviewing a PR (URL or diff), analyzing code changes for architectural fitness, or when the user asks for a thorough/staff-level/senior review of code changes. Triggers on "review this PR", "review these changes", "staff review", "thorough code review", sharing a GitHub PR URL for review, or asking about the architectural impact of changes.

Install

npx claudepluginhub smykla-skalski/sai --plugin staff-code-review

Tool Access

This skill is limited to using the following tools:

AgentBashReadWrite

Preview

Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.

Supporting Assets

references/backward-compatibility.mdreferences/convention-conformance.mdreferences/dead-code.mdreferences/performance-scalability.mdreferences/review-dimensions.mdscripts/parse_review_comments.py

SKILL.md

Similar Skills

connections-optimizer

Reorganizes X and LinkedIn networks: review-first pruning of low-value follows, priority-based add/follow recommendations, and drafts warm outreach in user's voice.

everything-claude-code

158.8k

content-engine

Generates platform-native social content for X, LinkedIn, TikTok, YouTube, newsletters from source material like articles, demos, docs, or notes. Adapts voice and format per platform.

everything-claude-code

158.8k

configure-ecc

Interactively installs Everything Claude Code skills and rules to user-level (~/.claude) or project-level (.claude) directories, verifies paths, and optimizes files. Activate on 'configure ecc' or setup requests.

everything-claude-code

158.8k

Stats

Parent Repo Stars2

Parent Repo Forks1

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

staff-code-review | staff-code-review | ClaudePluginHub

Skill

staff-code-review

From staff-code-review

Install

npx claudepluginhub smykla-skalski/sai --plugin staff-code-review

Tool Access

This skill is limited to using the following tools:

AgentBashReadWrite

Preview

Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.

Supporting Assets

references/backward-compatibility.mdreferences/convention-conformance.mdreferences/dead-code.mdreferences/performance-scalability.mdreferences/review-dimensions.mdscripts/parse_review_comments.py

SKILL.md

Staff-Level Code Review

Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.

The mental model shift: seniors ask "does this code work?" Staff engineers ask "should this code exist? does it fit our system? who else does this affect? what breaks in 18 months?"

Input

Accept any of:

GitHub PR URL → use gh pr diff <url> and gh pr view <url> to fetch
File paths → read the files and use git diff for changes
Pasted diff → analyze directly

Review process

Pass 1: Triage (fast, 2-3 min)

Before reading code line-by-line, answer these six questions (from Tanya Reilly's staff engineer mental model):

Necessity — Check for existing solutions. Flag reinvention.
Problem fit — Verify alignment with issue/spec/design doc.
Failure tolerance — Identify failure modes and blast radius.
Comprehensibility — Confirm a future maintainer can navigate without the author.
Architectural fit — Check pattern consistency and alignment.
Stakeholder awareness — Identify cross-team impact and unnotified API consumers.

Also check:

PR description quality — flag missing context or unclear problem statement
Scope — flag mixed unrelated changes
Size — flag >500 lines, suggest splitting
Test coverage — flag if not visible

If triage reveals fundamental issues (wrong approach, missing design doc, scope problems), stop and report the triage findings before doing a deep review. Don't waste time line-reviewing code that needs rethinking.

Pass 1.5: Codebase Research

After triage passes, spawn a single Agent (subagent_type: "Explore") to build a Research Brief grounding the review in actual codebase context. This runs before Pass 2 so all review agents share the same research — no redundant exploration.

Time budget: 30-60 seconds. Scale scope by PR size:

<5 changed files: investigate all changed functions/types
5-20 changed files: focus on public APIs and exported interfaces
>20 changed files: highest-risk changes only (new APIs, schema changes, security-sensitive paths)

Research dimensions (ordered by value):

Callers & consumers — Grep for who calls changed functions, APIs, interfaces, types. Count callers. This reveals blast radius that's invisible in the diff alone.
Existing patterns — Find similar implementations in the codebase. Check if the PR follows or diverges from established conventions.
Related tests — Find test files for changed modules. Identify whether critical paths (especially those with many callers) have test coverage.
Git history (lightweight) — git log --oneline -10 on changed files. Note area volatility and recent refactors.
Architecture context (lightweight) — Check for ADRs, design docs, README files in affected directories. Note module boundary crossings.

Output format — Research Brief:

<example> Input: PR modifying `functionA()` in `pkg/api` with new retry logic. Output: ```markdown ## Research Brief

Callers & Consumers

functionA() — called by 47 files (list top 5)
InterfaceB — implemented by 3 services
No external consumers found for internalHelper()

Existing Patterns

Similar retry logic in pkg/retry/ — PR diverges by not using shared package
Naming convention: existing code uses HandleX, PR uses ProcessX

Related Tests

foo_test.go covers functionA() happy path only — no error path tests
No tests found for newEndpoint()

Git History

service/auth/ had 12 commits in last 30 days (high volatility)
Last refactor: "extract auth middleware" 2 weeks ago

Architecture Context

ADR-014 mandates circuit breakers for external calls — relevant to this PR
Module boundary: PR imports from internal/billing into pkg/api (crosses boundary)

</example>

### Pass 2: Deep review (parallel)

Spawn parallel Agent subagents (subagent_type: "general-purpose") for each dimension group. Each agent gets the diff/files **and the Research Brief from Pass 1.5**. This parallelism is important for speed on large PRs.

Read [references/review-dimensions.md](references/review-dimensions.md) before distributing work to agents — it contains detailed checklists for each dimension.

**Agent 1: Architecture & Design**
- Architectural alignment: does this introduce a pattern that will be copied? Does it violate existing conventions?
- API design: deprecation path, error contract stability, surface area minimization
- Data model evolution: schema impact on existing records, unbounded growth, denormalization
- Cross-team impact: which teams consume affected APIs? Paved road drift?
- **Use Research Brief:** Check "Existing Patterns" for convention violations. Use "Callers & Consumers" to assess cross-team impact with real caller counts. Reference "Architecture Context" for ADR compliance.

**Agent 2: Reliability & Operations**
- Failure mode analysis: what happens when downstream is unavailable? Blast radius? Retry idempotency? Race conditions?
- Observability: can you diagnose a 3am outage? Structured logging with correlation IDs? Metrics for new paths? Alertability?
- Migration/rollback safety: backward-compatible with current version? Rollback procedure? Feature flags for disable without rollback? Expand-and-contract for schema changes? Progressive rollout path?
- **Use Research Brief:** Use "Callers & Consumers" to quantify blast radius (e.g., "47 callers affected" vs "unused internal helper"). Check "Related Tests" for coverage gaps on critical paths. Use "Git History" to calibrate risk — high-volatility areas deserve more scrutiny.

**Agent 3: Security & Dependencies**
- Security: input validation, auth on all endpoints, secrets in code/logs, PII exposure, injection risks, crypto choices
- Dependency management: license, CVEs, maintenance status, transitive footprint, supply chain risk
- Could this be done without the new dependency?
- **Use Research Brief:** Use "Callers & Consumers" to trace data flow through changed functions — identify where untrusted input enters. Check "Existing Patterns" for security practice consistency (e.g., does similar code validate inputs?). Reference "Architecture Context" for security-related ADRs.

**Agent 4: Performance & Scalability**

Read [references/performance-scalability.md](references/performance-scalability.md) before starting this analysis.

Evaluate the PR through three lenses: resource efficiency, scalability under growth, and operational readiness.

- **Data access**: N+1 queries, unbounded fetching, missing pagination/LIMIT, SELECT *, client-side filtering/aggregation, missing indexes for new query patterns, offset pagination on large datasets, transaction scope too wide
- **Resource management**: connection/file/socket leaks, goroutine/thread leaks without cancellation path, unbounded in-memory collections, missing cleanup in error paths (defer/finally), connection pool exhaustion risks
- **Caching**: missing cache for expensive repeated computation, cache stampede risk (no singleflight/locking), TTLs without jitter, unbounded cache growth, cache failure mode (fail vs degrade)
- **Concurrency**: race conditions on shared mutable state, lock granularity too coarse, deadlock risk from inconsistent lock ordering, unbounded goroutine/thread spawning, queues/buffers without max size (backpressure failure)
- **Compute efficiency**: string concatenation in loops (Go/Java), nested loops where hash lookup suffices, JSON serialization on hot paths where binary format is viable, regex compilation inside loops, sequential awaits that could be parallel
- **Network & I/O**: chatty inter-service calls (3+ sequential per request), missing batching, connection reuse disabled, synchronous blocking I/O on hot path, retry without exponential backoff + jitter, missing timeouts on all outbound calls
- **Scalability questions**: does this hold at 10x data volume? At 100x request rate? Is there backpressure? Fan-out bounded? Hot spots distributed?
- **Instrumentation**: new code paths missing latency/error/throughput metrics, no benchmark results for perf-sensitive changes, missing circuit breaker for downstream dependencies

Apply analysis frameworks contextually:
- **USE** (Utilization/Saturation/Errors) for resource-bound code (pools, queues, locks)
- **RED** (Rate/Errors/Duration) for new or modified service endpoints
- **Four Golden Signals** for production readiness assessment

Severity calibration:
- **blocking:** resource leaks, N+1 queries, missing timeouts, race conditions, unbounded fetching, retry storms — these cause outages at scale
- **issue:** missing indexes, coarse locks, no caching strategy, chatty APIs, verbose logging on hot paths — will degrade under growth
- **suggestion:** missing instrumentation, suboptimal algorithm at current scale, pool sizing undocumented, no graceful degradation strategy

- **Use Research Brief:** Use "Callers & Consumers" to determine if changed code is on a hot path (high caller count = higher severity). Check "Related Tests" for perf test coverage. Use "Git History" to identify recently optimized areas where the PR may regress. Reference "Existing Patterns" to check if the codebase has established patterns for caching, batching, or connection management that the PR should follow.

**Agent 5: Backward Compatibility**

Read [references/backward-compatibility.md](references/backward-compatibility.md) before starting this analysis.

Evaluate the PR for breaking changes across three compatibility dimensions: source, binary/wire, and behavioral.

- **API surface**: removed/renamed endpoints, fields, methods, types, parameters. Type changes (even widening). Required fields added to existing requests. Response envelope changes. Error code/format changes.
- **Wire format**: serialization format changes, JSON key casing, float/date representation, encoding, protobuf field number reuse/changes, streaming mode changes.
- **Behavioral/semantic**: default value changes, algorithm changes, error handling changes, side effect changes (sync→async), idempotency changes, event ordering changes.
- **Hyrum's Law exposure**: JSON key ordering, array element ordering, error message text, timing characteristics, numeric precision, response size patterns — any implicit contract a consumer may depend on.
- **Database schema**: column removals/renames/type changes, NOT NULL additions without defaults, constraint additions that reject existing data. Check for expand-and-contract pattern.
- **Configuration**: removed/renamed env vars, config keys, CLI flags. Changed defaults. Changed precedence. Feature flag default flips. Format changes without dual support.
- **Dependency impact**: minimum SDK/runtime version increases, diamond dependency conflicts, peer dependency narrowing, dropped platform support.
- **Rollback safety**: can the previous version still work after this deploys? Reversible migrations? Old code handles new schema/cache/queue formats?
- **Migration path**: if breaking, is expand-and-contract used? Versioned endpoints? Deprecation notices with sunset dates? Migration guide?

Severity calibration:
- **blocking:** existing consumers will fail, no migration path provided — removals, type changes, required field additions, wire format changes, schema destructive changes
- **issue:** breaking change with incomplete or undocumented migration path
- **question:** potentially breaking but need consumer usage data to assess
- **suggestion:** non-breaking but creates future compatibility surface without guardrails

- **Use Research Brief:** Use "Callers & Consumers" to count affected consumers and quantify blast radius — high caller count + breaking change = blocking, zero callers = downgrade severity. Check "Existing Patterns" for versioning/deprecation conventions the PR should follow. Reference "Architecture Context" for API contracts and compatibility policies. Use "Git History" to identify recent breaking changes that may compound with this one.

**Agent 6: Convention Conformance & Code Reuse**

Read [references/convention-conformance.md](references/convention-conformance.md) before starting this analysis.

Evaluate whether the PR follows repository conventions and reuses existing code instead of reimplementing.

- **Naming conventions**: do new functions, types, variables, files, and packages follow the naming patterns established in surrounding code? Check casing, prefixes/suffixes, verb choices (e.g., `Handle` vs `Process`, `Get` vs `Fetch`).
- **Structural patterns**: does the PR follow the project's established patterns for error handling, logging, configuration, initialization, and request/response flow? Grep for how similar operations are done elsewhere.
- **Existing utilities**: does the codebase already have helpers, shared packages, or internal libraries that solve what the PR implements from scratch? Check `pkg/`, `internal/`, `lib/`, `utils/`, `common/`, `shared/` directories.
- **Code duplication**: does the PR duplicate logic that already exists elsewhere? Look for near-identical implementations of the same algorithm, validation, transformation, or I/O pattern.
- **Test patterns**: do new tests follow the project's testing conventions (table-driven, fixtures, test helpers, assertion style, naming)?
- **Config & constants**: does the PR hardcode values that the project typically externalizes? Does it use the established config loading pattern?
- **Error handling style**: does error wrapping, sentinel errors, error types, and error message format match the codebase conventions?
- **Project structure**: are new files placed in the correct directories following the project's module/package organization? Do import paths follow established patterns?

Severity calibration:
- **blocking:** reimplements critical shared infrastructure (auth, retry, circuit breaker, validation) that exists as a maintained internal package — creates divergence and maintenance burden
- **issue:** inconsistent patterns that will confuse future contributors or cause bugs when mixed with existing code (e.g., different error handling approach in the same package)
- **suggestion:** naming divergence, minor style inconsistencies, opportunities to use existing helpers for non-critical code
- **nit:** trivial naming preferences within valid alternatives

- **Use Research Brief:** Use "Existing Patterns" as primary input — this is the core data for convention analysis. Grep for function/type names similar to what the PR introduces. Check "Callers & Consumers" to understand if the PR's new code will be called alongside existing code with different conventions (inconsistency risk). Reference "Architecture Context" for documented conventions in READMEs or ADRs.

**Agent 7: Dead Code Detection**

Read [references/dead-code.md](references/dead-code.md) before starting this analysis.

Identify code the PR introduces as dead or makes dead by orphaning existing code.

- **Newly introduced dead code**: functions, types, exports, variables, parameters, branches added by the PR that have zero callers/readers in the codebase. Grep for every new symbol.
- **Code orphaned by the PR**: the PR changes call sites, removes features, or refactors logic — check if old functions, constants, imports, types, or error handlers lost their last consumer.
- **Commented-out code**: blocks commented out instead of deleted. Version control exists for history.
- **Unreachable code**: code after unconditional return/break/panic/throw, impossible branches, exhaustive switches with redundant default.
- **Test dead code**: tests covering removed functionality, unused test helpers/fixtures, stale mocks for changed interfaces.

Before flagging, verify indirect reachability — interface implementations, reflection, serialization tags, framework conventions, plugin registration, and public API surface are NOT dead code. When uncertain, use `question:` severity.

Severity calibration:
- **issue:** entire functions, types, or modules with zero callers introduced or orphaned by the PR
- **suggestion:** small dead code — unused parameter, single orphaned constant, commented-out block
- **nit:** arguable cases with plausible indirect reachability
- **thought:** pre-existing dead code adjacent to the PR's changes (not the PR's fault)

- **Use Research Brief:** Use "Callers & Consumers" as primary input — cross-reference every new/modified symbol against its caller count. Zero callers on a non-public, non-interface symbol = dead code. Check "Existing Patterns" for framework conventions that create indirect reachability. Use "Git History" to identify recently removed features whose cleanup may be incomplete.

Each agent returns findings as conventional comments (see format below).

### Synthesis

After all agents return, merge findings and:
1. Deduplicate overlapping concerns
2. Rank by severity (blocking first)
3. Add cross-cutting observations that only emerge from seeing all dimensions together
4. Identify if a design doc/RFC should have preceded this PR (thresholds: public API change, new service, new infrastructure, data model for stateful system, security-sensitive subsystem)
5. **Note where research changed finding severity** — e.g., "47 callers makes this breaking change blocking" or "no callers found, downgrading to suggestion". Cite specific codebase evidence (file paths, caller counts, pattern matches) that grounded the assessment.
6. **Validate blocking findings** — for each blocking issue, verify the evidence by re-reading the relevant code. If evidence is ambiguous or based on assumptions, downgrade to `question:` and request clarification from the author. Repeat until every blocking issue has verified codebase evidence.

### Humanize Comments

After synthesis, humanize the prose in all review comments to remove AI writing patterns. This makes review comments sound like they were written by a human staff engineer, not generated by a model.

Spawn a single `Agent` (subagent_type: "general-purpose") with this prompt:

You are a prose editor. Your job: rewrite review comment messages to remove AI writing patterns while preserving technical accuracy and the exact structural format.

PATTERN CATALOG: Read the file at ${CLAUDE_SKILL_DIR}/../../../humanize/skills/humanize/references/patterns.md COMPOSITION GUIDE: Read the file at ${CLAUDE_SKILL_DIR}/../../../humanize/skills/humanize/references/elements-of-style.md

REVIEW TEXT: <review> {paste the full synthesized review markdown here} </review>

RULES:

Read both reference files in full before starting.
Rewrite ONLY the prose message text in comment lines. Apply fixes for all 24 AI writing patterns from the catalog regardless of severity.
Apply Strunk & White composition principles: active voice, concrete language, omit needless words, vary sentence structure, emphatic words at end of sentences.
NEVER modify these structural elements — they are parsed by downstream scripts:
- Label prefixes: blocking:, issue:, question:, suggestion:, thought:, nit:, praise:
- Location lines: Location: path/to/file:line
- Markdown headings (##, ###)
- The Review Summary section (verdict, counts)
- Code blocks and inline code
Keep messages concise. Staff engineers write tight prose.
Preserve all technical details, file paths, function names, and codebase evidence.
Do NOT add information that was not in the original.
Return the complete review markdown with humanized prose — same structure, better writing.


Replace the review text with the agent's output. If the agent fails or returns malformed output, proceed with the original unhurried text.

### Save Review

After humanizing completes, persist the full review markdown for reference.

1. Resolve the persistent data directory:
   ```bash
   DATA_DIR="${XDG_DATA_HOME:-$HOME/.local/share}/sai/staff-code-review/reviews"
   mkdir -p "$DATA_DIR"

Determine the PR number:
- If input was a GitHub PR URL: extract number via URL parsing (the /pull/{number} segment)
- Otherwise: use "local" as prefix
Generate timestamp: date -u +%Y%m%dT%H%M%S
Save the complete review markdown to ${DATA_DIR}/{pr_number}-{timestamp}.md using the Write tool
Report the saved file path to the user

Post to GitHub

Skip this phase entirely if the input was NOT a GitHub PR URL.

Create a PENDING (draft) review on the PR with inline comments. The review is invisible to the PR author until manually submitted from GitHub's UI — you can edit, delete, or add comments before publishing.

Parse owner, repo, and PR number from the URL (e.g., https://github.com/{owner}/{repo}/pull/{number})
Extract inline comments from the saved review file using the parsing script:
```
COMMENTS_JSON=$(python3 "${CLAUDE_SKILL_DIR}/scripts/parse_review_comments.py" "$REVIEW_FILE")
```
The script extracts comments with *Location:* annotations and filters to: all blockers, all issues, up to 5 suggestions, and up to 3 praises. Questions, thoughts, and nits are excluded from inline comments.
Write a concise top-level review body (2-4 sentences max). Do NOT mention the tool, "staff code review", or that this was AI-generated. Write it as a human reviewer would:
- Lead with the verdict (approve / request changes) and why in one sentence
- Mention the most important finding(s) — 1-2 sentences
- State blocking/non-blocking counts only if there are blockers
Example: "Looks good overall — clean separation of concerns and solid test coverage. The missing timeout on the HTTP client (line 42) needs fixing before merge. 1 blocker, 3 suggestions."

Build the API payload and submit:

python3 -c "
import json, sys
body = sys.argv[1]
comments = json.loads(sys.argv[2])
for c in comments:
    c.setdefault('side', 'RIGHT')
payload = {'body': body, 'comments': comments}
print(json.dumps(payload))
" "$REVIEW_BODY" "$COMMENTS_JSON" | gh api "repos/${OWNER}/${REPO}/pulls/${PR_NUMBER}/reviews" --input -

The payload omits the event field, which makes the API create a PENDING/draft review.

Report to the user:
- Review URL and comment count
- Remind them the review is PENDING — they must open GitHub and click "Submit review" to publish
- They can edit or delete individual comments before submitting
If the API call fails:
- Most common cause: a file path or line number in a comment doesn't match the PR diff (GitHub requires the line to be visible in the diff)
- Report the error and the saved review file path so the user can inspect and retry

Do NOT re-run the API call after it exits 0. The review creation is atomic — all comments are attached in a single request.

Comment format

Use conventional comments with explicit severity. Every comment MUST follow this exact two-line structure — the parsing script depends on it for GitHub review posting:

**{label}:** {message}
*Location:* `{path/to/file}:{line_number}`

Rules:

**{label}:** on the first line, followed by the message on the same line (may wrap to additional lines)
*Location:* on its own line immediately after the message, with backtick-wrapped path:line
Path must be relative to repo root, no leading ./
Line number must reference the exact line in the current file (not the diff)
Every file-specific comment MUST have a *Location:* line — comments without location cannot be posted as inline GitHub review comments
General observations (cross-cutting, overall praise) that don't map to a specific line may omit *Location:* — these go in the review body instead

<example> Input: Finding a missing timeout on an HTTP call in `pkg/client/fetch.go:42`. Output: ``` **blocking:** Missing timeout on outbound HTTP call — unbounded blocking will cascade during downstream outage. *Location:* `pkg/client/fetch.go:42` ``` </example>

Labels and when to use them:

Label	Blocking?	When to use
`blocking:`	Yes	Security vulnerabilities, architectural violations that spread, breaking API contracts without migration, race conditions, missing observability on critical paths
`issue:`	Yes	Correctness bugs, missing error handling on critical paths
`question:`	Soft	Need clarification before deciding — "why was this approach chosen?"
`suggestion:`	No	Alternative approach with rationale, author decides
`thought:`	No	Non-blocking idea, educational — pattern that matters at scale
`nit:`	No	Style/naming beyond linter scope, trivial polish
`praise:`	No	Acknowledge good work — especially important for teaching and culture

The threshold for blocking: only block when code will degrade system health, create security risk, or break contracts. Google's standard: approve once the change "definitely improves overall code health" — not when perfect.

Output structure

<example> Input: Completed review of PR "Add user notification service". Output: ```markdown # Staff Code Review: Add user notification service

Triage Assessment

[Mental model questions answered — 2-3 sentences on overall fit] [Flag if design doc/RFC should have preceded this] [Flag if PR should be split]

Research Brief

[Structured findings from codebase exploration — callers, patterns, tests, history, architecture context]

Review Summary

Verdict: [APPROVE | APPROVE_WITH_COMMENTS | REQUEST_CHANGES] Blocking issues: [count] Non-blocking suggestions: [count]

Blocking Issues

[Only if any exist. These must be resolved before merge.]

Architecture & Design

[Findings from Agent 1]

Reliability & Operations

[Findings from Agent 2]

Security & Dependencies

[Findings from Agent 3]

Performance & Scalability

[Findings from Agent 4]

Backward Compatibility

[Findings from Agent 5]

Convention Conformance & Code Reuse

[Findings from Agent 6]

Dead Code

[Findings from Agent 7]

Cross-Cutting Observations

[Insights that emerge from seeing all dimensions together]

What's Done Well

[Explicit praise — call out good patterns, especially ones others should learn from]

</example>

## Calibration guidance

These principles prevent common staff-review anti-patterns:

- **Don't block on preference.** If the author's approach is valid and yours is also valid, approve with a suggestion. "Multiple valid approaches should defer to the author's choice if equally defensible" (Google).
- **Don't scope-creep the review.** Review what's in the PR, not what you wish was in the PR. Pre-existing issues get `thought:` comments, not blocks.
- **Don't nit-bomb.** If you have >5 nits, that's a signal to invest in linter rules, not block a PR.
- **Explain the why.** "This is wrong" is not a review comment. "This creates a cascading failure when service X is unavailable because..." is.
- **Praise explicitly.** Good patterns reinforced in review get adopted by others. Call out well-designed error handling, clean abstractions, good test coverage.
- **Suggest taking it offline** if a discussion exceeds 2-3 back-and-forths on one point. Document resolution in PR.

## When to recommend a design doc/RFC before code

Flag (don't block, but strongly recommend) when the PR:
- Changes a public API or cross-team contract
- Introduces a new service or major decomposition
- Adds new infrastructure or third-party dependency
- Affects data model for a stateful system
- Touches security-sensitive subsystem (auth, payment, encryption)
- Creates a pattern that will be used as a template by others

## Reference

- Review dimensions overview: [references/review-dimensions.md](references/review-dimensions.md)
- Performance & scalability deep reference: [references/performance-scalability.md](references/performance-scalability.md)
- Backward compatibility deep reference: [references/backward-compatibility.md](references/backward-compatibility.md)
- Convention conformance deep reference: [references/convention-conformance.md](references/convention-conformance.md)
- Dead code detection deep reference: [references/dead-code.md](references/dead-code.md)

Similar Skills

connections-optimizer

Reorganizes X and LinkedIn networks: review-first pruning of low-value follows, priority-based add/follow recommendations, and drafts warm outreach in user's voice.

everything-claude-code

158.8k

content-engine

Generates platform-native social content for X, LinkedIn, TikTok, YouTube, newsletters from source material like articles, demos, docs, or notes. Adapts voice and format per platform.

everything-claude-code

158.8k

configure-ecc

everything-claude-code

158.8k

Stats

Parent Repo Stars2

Parent Repo Forks1

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Staff-Level Code Review

Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.

The mental model shift: seniors ask "does this code work?" Staff engineers ask "should this code exist? does it fit our system? who else does this affect? what breaks in 18 months?"

Input

Accept any of:

GitHub PR URL → use gh pr diff <url> and gh pr view <url> to fetch
File paths → read the files and use git diff for changes
Pasted diff → analyze directly

Review process

Pass 1: Triage (fast, 2-3 min)

Before reading code line-by-line, answer these six questions (from Tanya Reilly's staff engineer mental model):

Necessity — Check for existing solutions. Flag reinvention.
Problem fit — Verify alignment with issue/spec/design doc.
Failure tolerance — Identify failure modes and blast radius.
Comprehensibility — Confirm a future maintainer can navigate without the author.
Architectural fit — Check pattern consistency and alignment.
Stakeholder awareness — Identify cross-team impact and unnotified API consumers.

Also check:

PR description quality — flag missing context or unclear problem statement
Scope — flag mixed unrelated changes
Size — flag >500 lines, suggest splitting
Test coverage — flag if not visible

Pass 1.5: Codebase Research

Time budget: 30-60 seconds. Scale scope by PR size:

<5 changed files: investigate all changed functions/types
5-20 changed files: focus on public APIs and exported interfaces
>20 changed files: highest-risk changes only (new APIs, schema changes, security-sensitive paths)

Research dimensions (ordered by value):

Callers & consumers — Grep for who calls changed functions, APIs, interfaces, types. Count callers. This reveals blast radius that's invisible in the diff alone.
Existing patterns — Find similar implementations in the codebase. Check if the PR follows or diverges from established conventions.
Related tests — Find test files for changed modules. Identify whether critical paths (especially those with many callers) have test coverage.
Git history (lightweight) — git log --oneline -10 on changed files. Note area volatility and recent refactors.
Architecture context (lightweight) — Check for ADRs, design docs, README files in affected directories. Note module boundary crossings.

Output format — Research Brief:

<example> Input: PR modifying `functionA()` in `pkg/api` with new retry logic. Output: ```markdown ## Research Brief

Callers & Consumers

functionA() — called by 47 files (list top 5)
InterfaceB — implemented by 3 services
No external consumers found for internalHelper()

Existing Patterns

Similar retry logic in pkg/retry/ — PR diverges by not using shared package
Naming convention: existing code uses HandleX, PR uses ProcessX

Related Tests

foo_test.go covers functionA() happy path only — no error path tests
No tests found for newEndpoint()

Git History

service/auth/ had 12 commits in last 30 days (high volatility)
Last refactor: "extract auth middleware" 2 weeks ago

Architecture Context

ADR-014 mandates circuit breakers for external calls — relevant to this PR
Module boundary: PR imports from internal/billing into pkg/api (crosses boundary)

</example>

### Pass 2: Deep review (parallel)

Spawn parallel Agent subagents (subagent_type: "general-purpose") for each dimension group. Each agent gets the diff/files **and the Research Brief from Pass 1.5**. This parallelism is important for speed on large PRs.

Read [references/review-dimensions.md](references/review-dimensions.md) before distributing work to agents — it contains detailed checklists for each dimension.

**Agent 1: Architecture & Design**
- Architectural alignment: does this introduce a pattern that will be copied? Does it violate existing conventions?
- API design: deprecation path, error contract stability, surface area minimization
- Data model evolution: schema impact on existing records, unbounded growth, denormalization
- Cross-team impact: which teams consume affected APIs? Paved road drift?
- **Use Research Brief:** Check "Existing Patterns" for convention violations. Use "Callers & Consumers" to assess cross-team impact with real caller counts. Reference "Architecture Context" for ADR compliance.

**Agent 2: Reliability & Operations**
- Failure mode analysis: what happens when downstream is unavailable? Blast radius? Retry idempotency? Race conditions?
- Observability: can you diagnose a 3am outage? Structured logging with correlation IDs? Metrics for new paths? Alertability?
- Migration/rollback safety: backward-compatible with current version? Rollback procedure? Feature flags for disable without rollback? Expand-and-contract for schema changes? Progressive rollout path?
- **Use Research Brief:** Use "Callers & Consumers" to quantify blast radius (e.g., "47 callers affected" vs "unused internal helper"). Check "Related Tests" for coverage gaps on critical paths. Use "Git History" to calibrate risk — high-volatility areas deserve more scrutiny.

**Agent 3: Security & Dependencies**
- Security: input validation, auth on all endpoints, secrets in code/logs, PII exposure, injection risks, crypto choices
- Dependency management: license, CVEs, maintenance status, transitive footprint, supply chain risk
- Could this be done without the new dependency?
- **Use Research Brief:** Use "Callers & Consumers" to trace data flow through changed functions — identify where untrusted input enters. Check "Existing Patterns" for security practice consistency (e.g., does similar code validate inputs?). Reference "Architecture Context" for security-related ADRs.

**Agent 4: Performance & Scalability**

Read [references/performance-scalability.md](references/performance-scalability.md) before starting this analysis.

Evaluate the PR through three lenses: resource efficiency, scalability under growth, and operational readiness.

- **Data access**: N+1 queries, unbounded fetching, missing pagination/LIMIT, SELECT *, client-side filtering/aggregation, missing indexes for new query patterns, offset pagination on large datasets, transaction scope too wide
- **Resource management**: connection/file/socket leaks, goroutine/thread leaks without cancellation path, unbounded in-memory collections, missing cleanup in error paths (defer/finally), connection pool exhaustion risks
- **Caching**: missing cache for expensive repeated computation, cache stampede risk (no singleflight/locking), TTLs without jitter, unbounded cache growth, cache failure mode (fail vs degrade)
- **Concurrency**: race conditions on shared mutable state, lock granularity too coarse, deadlock risk from inconsistent lock ordering, unbounded goroutine/thread spawning, queues/buffers without max size (backpressure failure)
- **Compute efficiency**: string concatenation in loops (Go/Java), nested loops where hash lookup suffices, JSON serialization on hot paths where binary format is viable, regex compilation inside loops, sequential awaits that could be parallel
- **Network & I/O**: chatty inter-service calls (3+ sequential per request), missing batching, connection reuse disabled, synchronous blocking I/O on hot path, retry without exponential backoff + jitter, missing timeouts on all outbound calls
- **Scalability questions**: does this hold at 10x data volume? At 100x request rate? Is there backpressure? Fan-out bounded? Hot spots distributed?
- **Instrumentation**: new code paths missing latency/error/throughput metrics, no benchmark results for perf-sensitive changes, missing circuit breaker for downstream dependencies

Apply analysis frameworks contextually:
- **USE** (Utilization/Saturation/Errors) for resource-bound code (pools, queues, locks)
- **RED** (Rate/Errors/Duration) for new or modified service endpoints
- **Four Golden Signals** for production readiness assessment

Severity calibration:
- **blocking:** resource leaks, N+1 queries, missing timeouts, race conditions, unbounded fetching, retry storms — these cause outages at scale
- **issue:** missing indexes, coarse locks, no caching strategy, chatty APIs, verbose logging on hot paths — will degrade under growth
- **suggestion:** missing instrumentation, suboptimal algorithm at current scale, pool sizing undocumented, no graceful degradation strategy

- **Use Research Brief:** Use "Callers & Consumers" to determine if changed code is on a hot path (high caller count = higher severity). Check "Related Tests" for perf test coverage. Use "Git History" to identify recently optimized areas where the PR may regress. Reference "Existing Patterns" to check if the codebase has established patterns for caching, batching, or connection management that the PR should follow.

**Agent 5: Backward Compatibility**

Read [references/backward-compatibility.md](references/backward-compatibility.md) before starting this analysis.

Evaluate the PR for breaking changes across three compatibility dimensions: source, binary/wire, and behavioral.

- **API surface**: removed/renamed endpoints, fields, methods, types, parameters. Type changes (even widening). Required fields added to existing requests. Response envelope changes. Error code/format changes.
- **Wire format**: serialization format changes, JSON key casing, float/date representation, encoding, protobuf field number reuse/changes, streaming mode changes.
- **Behavioral/semantic**: default value changes, algorithm changes, error handling changes, side effect changes (sync→async), idempotency changes, event ordering changes.
- **Hyrum's Law exposure**: JSON key ordering, array element ordering, error message text, timing characteristics, numeric precision, response size patterns — any implicit contract a consumer may depend on.
- **Database schema**: column removals/renames/type changes, NOT NULL additions without defaults, constraint additions that reject existing data. Check for expand-and-contract pattern.
- **Configuration**: removed/renamed env vars, config keys, CLI flags. Changed defaults. Changed precedence. Feature flag default flips. Format changes without dual support.
- **Dependency impact**: minimum SDK/runtime version increases, diamond dependency conflicts, peer dependency narrowing, dropped platform support.
- **Rollback safety**: can the previous version still work after this deploys? Reversible migrations? Old code handles new schema/cache/queue formats?
- **Migration path**: if breaking, is expand-and-contract used? Versioned endpoints? Deprecation notices with sunset dates? Migration guide?

Severity calibration:
- **blocking:** existing consumers will fail, no migration path provided — removals, type changes, required field additions, wire format changes, schema destructive changes
- **issue:** breaking change with incomplete or undocumented migration path
- **question:** potentially breaking but need consumer usage data to assess
- **suggestion:** non-breaking but creates future compatibility surface without guardrails

- **Use Research Brief:** Use "Callers & Consumers" to count affected consumers and quantify blast radius — high caller count + breaking change = blocking, zero callers = downgrade severity. Check "Existing Patterns" for versioning/deprecation conventions the PR should follow. Reference "Architecture Context" for API contracts and compatibility policies. Use "Git History" to identify recent breaking changes that may compound with this one.

**Agent 6: Convention Conformance & Code Reuse**

Read [references/convention-conformance.md](references/convention-conformance.md) before starting this analysis.

Evaluate whether the PR follows repository conventions and reuses existing code instead of reimplementing.

- **Naming conventions**: do new functions, types, variables, files, and packages follow the naming patterns established in surrounding code? Check casing, prefixes/suffixes, verb choices (e.g., `Handle` vs `Process`, `Get` vs `Fetch`).
- **Structural patterns**: does the PR follow the project's established patterns for error handling, logging, configuration, initialization, and request/response flow? Grep for how similar operations are done elsewhere.
- **Existing utilities**: does the codebase already have helpers, shared packages, or internal libraries that solve what the PR implements from scratch? Check `pkg/`, `internal/`, `lib/`, `utils/`, `common/`, `shared/` directories.
- **Code duplication**: does the PR duplicate logic that already exists elsewhere? Look for near-identical implementations of the same algorithm, validation, transformation, or I/O pattern.
- **Test patterns**: do new tests follow the project's testing conventions (table-driven, fixtures, test helpers, assertion style, naming)?
- **Config & constants**: does the PR hardcode values that the project typically externalizes? Does it use the established config loading pattern?
- **Error handling style**: does error wrapping, sentinel errors, error types, and error message format match the codebase conventions?
- **Project structure**: are new files placed in the correct directories following the project's module/package organization? Do import paths follow established patterns?

Severity calibration:
- **blocking:** reimplements critical shared infrastructure (auth, retry, circuit breaker, validation) that exists as a maintained internal package — creates divergence and maintenance burden
- **issue:** inconsistent patterns that will confuse future contributors or cause bugs when mixed with existing code (e.g., different error handling approach in the same package)
- **suggestion:** naming divergence, minor style inconsistencies, opportunities to use existing helpers for non-critical code
- **nit:** trivial naming preferences within valid alternatives

- **Use Research Brief:** Use "Existing Patterns" as primary input — this is the core data for convention analysis. Grep for function/type names similar to what the PR introduces. Check "Callers & Consumers" to understand if the PR's new code will be called alongside existing code with different conventions (inconsistency risk). Reference "Architecture Context" for documented conventions in READMEs or ADRs.

**Agent 7: Dead Code Detection**

Read [references/dead-code.md](references/dead-code.md) before starting this analysis.

Identify code the PR introduces as dead or makes dead by orphaning existing code.

- **Newly introduced dead code**: functions, types, exports, variables, parameters, branches added by the PR that have zero callers/readers in the codebase. Grep for every new symbol.
- **Code orphaned by the PR**: the PR changes call sites, removes features, or refactors logic — check if old functions, constants, imports, types, or error handlers lost their last consumer.
- **Commented-out code**: blocks commented out instead of deleted. Version control exists for history.
- **Unreachable code**: code after unconditional return/break/panic/throw, impossible branches, exhaustive switches with redundant default.
- **Test dead code**: tests covering removed functionality, unused test helpers/fixtures, stale mocks for changed interfaces.

Before flagging, verify indirect reachability — interface implementations, reflection, serialization tags, framework conventions, plugin registration, and public API surface are NOT dead code. When uncertain, use `question:` severity.

Severity calibration:
- **issue:** entire functions, types, or modules with zero callers introduced or orphaned by the PR
- **suggestion:** small dead code — unused parameter, single orphaned constant, commented-out block
- **nit:** arguable cases with plausible indirect reachability
- **thought:** pre-existing dead code adjacent to the PR's changes (not the PR's fault)

- **Use Research Brief:** Use "Callers & Consumers" as primary input — cross-reference every new/modified symbol against its caller count. Zero callers on a non-public, non-interface symbol = dead code. Check "Existing Patterns" for framework conventions that create indirect reachability. Use "Git History" to identify recently removed features whose cleanup may be incomplete.

Each agent returns findings as conventional comments (see format below).

### Synthesis

After all agents return, merge findings and:
1. Deduplicate overlapping concerns
2. Rank by severity (blocking first)
3. Add cross-cutting observations that only emerge from seeing all dimensions together
4. Identify if a design doc/RFC should have preceded this PR (thresholds: public API change, new service, new infrastructure, data model for stateful system, security-sensitive subsystem)
5. **Note where research changed finding severity** — e.g., "47 callers makes this breaking change blocking" or "no callers found, downgrading to suggestion". Cite specific codebase evidence (file paths, caller counts, pattern matches) that grounded the assessment.
6. **Validate blocking findings** — for each blocking issue, verify the evidence by re-reading the relevant code. If evidence is ambiguous or based on assumptions, downgrade to `question:` and request clarification from the author. Repeat until every blocking issue has verified codebase evidence.

### Humanize Comments

After synthesis, humanize the prose in all review comments to remove AI writing patterns. This makes review comments sound like they were written by a human staff engineer, not generated by a model.

Spawn a single `Agent` (subagent_type: "general-purpose") with this prompt:

You are a prose editor. Your job: rewrite review comment messages to remove AI writing patterns while preserving technical accuracy and the exact structural format.

REVIEW TEXT: <review> {paste the full synthesized review markdown here} </review>

RULES:

Read both reference files in full before starting.
Rewrite ONLY the prose message text in comment lines. Apply fixes for all 24 AI writing patterns from the catalog regardless of severity.
Apply Strunk & White composition principles: active voice, concrete language, omit needless words, vary sentence structure, emphatic words at end of sentences.
NEVER modify these structural elements — they are parsed by downstream scripts:
- Label prefixes: blocking:, issue:, question:, suggestion:, thought:, nit:, praise:
- Location lines: Location: path/to/file:line
- Markdown headings (##, ###)
- The Review Summary section (verdict, counts)
- Code blocks and inline code
Keep messages concise. Staff engineers write tight prose.
Preserve all technical details, file paths, function names, and codebase evidence.
Do NOT add information that was not in the original.
Return the complete review markdown with humanized prose — same structure, better writing.


Replace the review text with the agent's output. If the agent fails or returns malformed output, proceed with the original unhurried text.

### Save Review

After humanizing completes, persist the full review markdown for reference.

1. Resolve the persistent data directory:
   ```bash
   DATA_DIR="${XDG_DATA_HOME:-$HOME/.local/share}/sai/staff-code-review/reviews"
   mkdir -p "$DATA_DIR"

Determine the PR number:
- If input was a GitHub PR URL: extract number via URL parsing (the /pull/{number} segment)
- Otherwise: use "local" as prefix
Generate timestamp: date -u +%Y%m%dT%H%M%S
Save the complete review markdown to ${DATA_DIR}/{pr_number}-{timestamp}.md using the Write tool
Report the saved file path to the user

Post to GitHub

Skip this phase entirely if the input was NOT a GitHub PR URL.

Parse owner, repo, and PR number from the URL (e.g., https://github.com/{owner}/{repo}/pull/{number})
Extract inline comments from the saved review file using the parsing script:
```
COMMENTS_JSON=$(python3 "${CLAUDE_SKILL_DIR}/scripts/parse_review_comments.py" "$REVIEW_FILE")
```
The script extracts comments with *Location:* annotations and filters to: all blockers, all issues, up to 5 suggestions, and up to 3 praises. Questions, thoughts, and nits are excluded from inline comments.
Write a concise top-level review body (2-4 sentences max). Do NOT mention the tool, "staff code review", or that this was AI-generated. Write it as a human reviewer would:
- Lead with the verdict (approve / request changes) and why in one sentence
- Mention the most important finding(s) — 1-2 sentences
- State blocking/non-blocking counts only if there are blockers
Example: "Looks good overall — clean separation of concerns and solid test coverage. The missing timeout on the HTTP client (line 42) needs fixing before merge. 1 blocker, 3 suggestions."

Build the API payload and submit:

python3 -c "
import json, sys
body = sys.argv[1]
comments = json.loads(sys.argv[2])
for c in comments:
    c.setdefault('side', 'RIGHT')
payload = {'body': body, 'comments': comments}
print(json.dumps(payload))
" "$REVIEW_BODY" "$COMMENTS_JSON" | gh api "repos/${OWNER}/${REPO}/pulls/${PR_NUMBER}/reviews" --input -

The payload omits the event field, which makes the API create a PENDING/draft review.

Report to the user:
- Review URL and comment count
- Remind them the review is PENDING — they must open GitHub and click "Submit review" to publish
- They can edit or delete individual comments before submitting
If the API call fails:
- Most common cause: a file path or line number in a comment doesn't match the PR diff (GitHub requires the line to be visible in the diff)
- Report the error and the saved review file path so the user can inspect and retry

Do NOT re-run the API call after it exits 0. The review creation is atomic — all comments are attached in a single request.

Comment format

Use conventional comments with explicit severity. Every comment MUST follow this exact two-line structure — the parsing script depends on it for GitHub review posting:

**{label}:** {message}
*Location:* `{path/to/file}:{line_number}`

Rules:

**{label}:** on the first line, followed by the message on the same line (may wrap to additional lines)
*Location:* on its own line immediately after the message, with backtick-wrapped path:line
Path must be relative to repo root, no leading ./
Line number must reference the exact line in the current file (not the diff)
Every file-specific comment MUST have a *Location:* line — comments without location cannot be posted as inline GitHub review comments
General observations (cross-cutting, overall praise) that don't map to a specific line may omit *Location:* — these go in the review body instead

Labels and when to use them:

Label	Blocking?	When to use
`blocking:`	Yes	Security vulnerabilities, architectural violations that spread, breaking API contracts without migration, race conditions, missing observability on critical paths
`issue:`	Yes	Correctness bugs, missing error handling on critical paths
`question:`	Soft	Need clarification before deciding — "why was this approach chosen?"
`suggestion:`	No	Alternative approach with rationale, author decides
`thought:`	No	Non-blocking idea, educational — pattern that matters at scale
`nit:`	No	Style/naming beyond linter scope, trivial polish
`praise:`	No	Acknowledge good work — especially important for teaching and culture

Output structure

<example> Input: Completed review of PR "Add user notification service". Output: ```markdown # Staff Code Review: Add user notification service

Triage Assessment

[Mental model questions answered — 2-3 sentences on overall fit] [Flag if design doc/RFC should have preceded this] [Flag if PR should be split]

Research Brief

[Structured findings from codebase exploration — callers, patterns, tests, history, architecture context]

Review Summary

Verdict: [APPROVE | APPROVE_WITH_COMMENTS | REQUEST_CHANGES] Blocking issues: [count] Non-blocking suggestions: [count]

Blocking Issues

[Only if any exist. These must be resolved before merge.]

Architecture & Design

[Findings from Agent 1]

Reliability & Operations

[Findings from Agent 2]

Security & Dependencies

[Findings from Agent 3]

Performance & Scalability

[Findings from Agent 4]

Backward Compatibility

[Findings from Agent 5]

Convention Conformance & Code Reuse

[Findings from Agent 6]

Dead Code

[Findings from Agent 7]

Cross-Cutting Observations

[Insights that emerge from seeing all dimensions together]

What's Done Well

[Explicit praise — call out good patterns, especially ones others should learn from]

</example>

## Calibration guidance

These principles prevent common staff-review anti-patterns:

- **Don't block on preference.** If the author's approach is valid and yours is also valid, approve with a suggestion. "Multiple valid approaches should defer to the author's choice if equally defensible" (Google).
- **Don't scope-creep the review.** Review what's in the PR, not what you wish was in the PR. Pre-existing issues get `thought:` comments, not blocks.
- **Don't nit-bomb.** If you have >5 nits, that's a signal to invest in linter rules, not block a PR.
- **Explain the why.** "This is wrong" is not a review comment. "This creates a cascading failure when service X is unavailable because..." is.
- **Praise explicitly.** Good patterns reinforced in review get adopted by others. Call out well-designed error handling, clean abstractions, good test coverage.
- **Suggest taking it offline** if a discussion exceeds 2-3 back-and-forths on one point. Document resolution in PR.

## When to recommend a design doc/RFC before code

Flag (don't block, but strongly recommend) when the PR:
- Changes a public API or cross-team contract
- Introduces a new service or major decomposition
- Adds new infrastructure or third-party dependency
- Affects data model for a stateful system
- Touches security-sensitive subsystem (auth, payment, encryption)
- Creates a pattern that will be used as a template by others

## Reference

- Review dimensions overview: [references/review-dimensions.md](references/review-dimensions.md)
- Performance & scalability deep reference: [references/performance-scalability.md](references/performance-scalability.md)
- Backward compatibility deep reference: [references/backward-compatibility.md](references/backward-compatibility.md)
- Convention conformance deep reference: [references/convention-conformance.md](references/convention-conformance.md)
- Dead code detection deep reference: [references/dead-code.md](references/dead-code.md)