From staff-code-review
Staff-engineer-level code review that goes beyond correctness to evaluate architectural alignment, system-level implications, failure modes, performance, scalability, backward compatibility, observability, security, and cross-team impact. Use when reviewing a PR (URL or diff), analyzing code changes for architectural fitness, or when the user asks for a thorough/staff-level/senior review of code changes. Triggers on "review this PR", "review these changes", "staff review", "thorough code review", sharing a GitHub PR URL for review, or asking about the architectural impact of changes.
npx claudepluginhub smykla-skalski/sai --plugin staff-code-reviewThis skill is limited to using the following tools:
Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.
Reorganizes X and LinkedIn networks: review-first pruning of low-value follows, priority-based add/follow recommendations, and drafts warm outreach in user's voice.
Generates platform-native social content for X, LinkedIn, TikTok, YouTube, newsletters from source material like articles, demos, docs, or notes. Adapts voice and format per platform.
Interactively installs Everything Claude Code skills and rules to user-level (~/.claude) or project-level (.claude) directories, verifies paths, and optimizes files. Activate on 'configure ecc' or setup requests.
Review code as a staff engineer. Go beyond correctness — evaluate whether the change fits the system, handles failure gracefully, and won't create problems at scale or across teams.
The mental model shift: seniors ask "does this code work?" Staff engineers ask "should this code exist? does it fit our system? who else does this affect? what breaks in 18 months?"
Accept any of:
gh pr diff <url> and gh pr view <url> to fetchgit diff for changesBefore reading code line-by-line, answer these six questions (from Tanya Reilly's staff engineer mental model):
Also check:
If triage reveals fundamental issues (wrong approach, missing design doc, scope problems), stop and report the triage findings before doing a deep review. Don't waste time line-reviewing code that needs rethinking.
After triage passes, spawn a single Agent (subagent_type: "Explore") to build a Research Brief grounding the review in actual codebase context. This runs before Pass 2 so all review agents share the same research — no redundant exploration.
Time budget: 30-60 seconds. Scale scope by PR size:
Research dimensions (ordered by value):
git log --oneline -10 on changed files. Note area volatility and recent refactors.Output format — Research Brief:
<example> Input: PR modifying `functionA()` in `pkg/api` with new retry logic. Output: ```markdown ## Research BrieffunctionA() — called by 47 files (list top 5)InterfaceB — implemented by 3 servicesinternalHelper()pkg/retry/ — PR diverges by not using shared packageHandleX, PR uses ProcessXfoo_test.go covers functionA() happy path only — no error path testsnewEndpoint()service/auth/ had 12 commits in last 30 days (high volatility)internal/billing into pkg/api (crosses boundary)</example>
### Pass 2: Deep review (parallel)
Spawn parallel Agent subagents (subagent_type: "general-purpose") for each dimension group. Each agent gets the diff/files **and the Research Brief from Pass 1.5**. This parallelism is important for speed on large PRs.
Read [references/review-dimensions.md](references/review-dimensions.md) before distributing work to agents — it contains detailed checklists for each dimension.
**Agent 1: Architecture & Design**
- Architectural alignment: does this introduce a pattern that will be copied? Does it violate existing conventions?
- API design: deprecation path, error contract stability, surface area minimization
- Data model evolution: schema impact on existing records, unbounded growth, denormalization
- Cross-team impact: which teams consume affected APIs? Paved road drift?
- **Use Research Brief:** Check "Existing Patterns" for convention violations. Use "Callers & Consumers" to assess cross-team impact with real caller counts. Reference "Architecture Context" for ADR compliance.
**Agent 2: Reliability & Operations**
- Failure mode analysis: what happens when downstream is unavailable? Blast radius? Retry idempotency? Race conditions?
- Observability: can you diagnose a 3am outage? Structured logging with correlation IDs? Metrics for new paths? Alertability?
- Migration/rollback safety: backward-compatible with current version? Rollback procedure? Feature flags for disable without rollback? Expand-and-contract for schema changes? Progressive rollout path?
- **Use Research Brief:** Use "Callers & Consumers" to quantify blast radius (e.g., "47 callers affected" vs "unused internal helper"). Check "Related Tests" for coverage gaps on critical paths. Use "Git History" to calibrate risk — high-volatility areas deserve more scrutiny.
**Agent 3: Security & Dependencies**
- Security: input validation, auth on all endpoints, secrets in code/logs, PII exposure, injection risks, crypto choices
- Dependency management: license, CVEs, maintenance status, transitive footprint, supply chain risk
- Could this be done without the new dependency?
- **Use Research Brief:** Use "Callers & Consumers" to trace data flow through changed functions — identify where untrusted input enters. Check "Existing Patterns" for security practice consistency (e.g., does similar code validate inputs?). Reference "Architecture Context" for security-related ADRs.
**Agent 4: Performance & Scalability**
Read [references/performance-scalability.md](references/performance-scalability.md) before starting this analysis.
Evaluate the PR through three lenses: resource efficiency, scalability under growth, and operational readiness.
- **Data access**: N+1 queries, unbounded fetching, missing pagination/LIMIT, SELECT *, client-side filtering/aggregation, missing indexes for new query patterns, offset pagination on large datasets, transaction scope too wide
- **Resource management**: connection/file/socket leaks, goroutine/thread leaks without cancellation path, unbounded in-memory collections, missing cleanup in error paths (defer/finally), connection pool exhaustion risks
- **Caching**: missing cache for expensive repeated computation, cache stampede risk (no singleflight/locking), TTLs without jitter, unbounded cache growth, cache failure mode (fail vs degrade)
- **Concurrency**: race conditions on shared mutable state, lock granularity too coarse, deadlock risk from inconsistent lock ordering, unbounded goroutine/thread spawning, queues/buffers without max size (backpressure failure)
- **Compute efficiency**: string concatenation in loops (Go/Java), nested loops where hash lookup suffices, JSON serialization on hot paths where binary format is viable, regex compilation inside loops, sequential awaits that could be parallel
- **Network & I/O**: chatty inter-service calls (3+ sequential per request), missing batching, connection reuse disabled, synchronous blocking I/O on hot path, retry without exponential backoff + jitter, missing timeouts on all outbound calls
- **Scalability questions**: does this hold at 10x data volume? At 100x request rate? Is there backpressure? Fan-out bounded? Hot spots distributed?
- **Instrumentation**: new code paths missing latency/error/throughput metrics, no benchmark results for perf-sensitive changes, missing circuit breaker for downstream dependencies
Apply analysis frameworks contextually:
- **USE** (Utilization/Saturation/Errors) for resource-bound code (pools, queues, locks)
- **RED** (Rate/Errors/Duration) for new or modified service endpoints
- **Four Golden Signals** for production readiness assessment
Severity calibration:
- **blocking:** resource leaks, N+1 queries, missing timeouts, race conditions, unbounded fetching, retry storms — these cause outages at scale
- **issue:** missing indexes, coarse locks, no caching strategy, chatty APIs, verbose logging on hot paths — will degrade under growth
- **suggestion:** missing instrumentation, suboptimal algorithm at current scale, pool sizing undocumented, no graceful degradation strategy
- **Use Research Brief:** Use "Callers & Consumers" to determine if changed code is on a hot path (high caller count = higher severity). Check "Related Tests" for perf test coverage. Use "Git History" to identify recently optimized areas where the PR may regress. Reference "Existing Patterns" to check if the codebase has established patterns for caching, batching, or connection management that the PR should follow.
**Agent 5: Backward Compatibility**
Read [references/backward-compatibility.md](references/backward-compatibility.md) before starting this analysis.
Evaluate the PR for breaking changes across three compatibility dimensions: source, binary/wire, and behavioral.
- **API surface**: removed/renamed endpoints, fields, methods, types, parameters. Type changes (even widening). Required fields added to existing requests. Response envelope changes. Error code/format changes.
- **Wire format**: serialization format changes, JSON key casing, float/date representation, encoding, protobuf field number reuse/changes, streaming mode changes.
- **Behavioral/semantic**: default value changes, algorithm changes, error handling changes, side effect changes (sync→async), idempotency changes, event ordering changes.
- **Hyrum's Law exposure**: JSON key ordering, array element ordering, error message text, timing characteristics, numeric precision, response size patterns — any implicit contract a consumer may depend on.
- **Database schema**: column removals/renames/type changes, NOT NULL additions without defaults, constraint additions that reject existing data. Check for expand-and-contract pattern.
- **Configuration**: removed/renamed env vars, config keys, CLI flags. Changed defaults. Changed precedence. Feature flag default flips. Format changes without dual support.
- **Dependency impact**: minimum SDK/runtime version increases, diamond dependency conflicts, peer dependency narrowing, dropped platform support.
- **Rollback safety**: can the previous version still work after this deploys? Reversible migrations? Old code handles new schema/cache/queue formats?
- **Migration path**: if breaking, is expand-and-contract used? Versioned endpoints? Deprecation notices with sunset dates? Migration guide?
Severity calibration:
- **blocking:** existing consumers will fail, no migration path provided — removals, type changes, required field additions, wire format changes, schema destructive changes
- **issue:** breaking change with incomplete or undocumented migration path
- **question:** potentially breaking but need consumer usage data to assess
- **suggestion:** non-breaking but creates future compatibility surface without guardrails
- **Use Research Brief:** Use "Callers & Consumers" to count affected consumers and quantify blast radius — high caller count + breaking change = blocking, zero callers = downgrade severity. Check "Existing Patterns" for versioning/deprecation conventions the PR should follow. Reference "Architecture Context" for API contracts and compatibility policies. Use "Git History" to identify recent breaking changes that may compound with this one.
**Agent 6: Convention Conformance & Code Reuse**
Read [references/convention-conformance.md](references/convention-conformance.md) before starting this analysis.
Evaluate whether the PR follows repository conventions and reuses existing code instead of reimplementing.
- **Naming conventions**: do new functions, types, variables, files, and packages follow the naming patterns established in surrounding code? Check casing, prefixes/suffixes, verb choices (e.g., `Handle` vs `Process`, `Get` vs `Fetch`).
- **Structural patterns**: does the PR follow the project's established patterns for error handling, logging, configuration, initialization, and request/response flow? Grep for how similar operations are done elsewhere.
- **Existing utilities**: does the codebase already have helpers, shared packages, or internal libraries that solve what the PR implements from scratch? Check `pkg/`, `internal/`, `lib/`, `utils/`, `common/`, `shared/` directories.
- **Code duplication**: does the PR duplicate logic that already exists elsewhere? Look for near-identical implementations of the same algorithm, validation, transformation, or I/O pattern.
- **Test patterns**: do new tests follow the project's testing conventions (table-driven, fixtures, test helpers, assertion style, naming)?
- **Config & constants**: does the PR hardcode values that the project typically externalizes? Does it use the established config loading pattern?
- **Error handling style**: does error wrapping, sentinel errors, error types, and error message format match the codebase conventions?
- **Project structure**: are new files placed in the correct directories following the project's module/package organization? Do import paths follow established patterns?
Severity calibration:
- **blocking:** reimplements critical shared infrastructure (auth, retry, circuit breaker, validation) that exists as a maintained internal package — creates divergence and maintenance burden
- **issue:** inconsistent patterns that will confuse future contributors or cause bugs when mixed with existing code (e.g., different error handling approach in the same package)
- **suggestion:** naming divergence, minor style inconsistencies, opportunities to use existing helpers for non-critical code
- **nit:** trivial naming preferences within valid alternatives
- **Use Research Brief:** Use "Existing Patterns" as primary input — this is the core data for convention analysis. Grep for function/type names similar to what the PR introduces. Check "Callers & Consumers" to understand if the PR's new code will be called alongside existing code with different conventions (inconsistency risk). Reference "Architecture Context" for documented conventions in READMEs or ADRs.
**Agent 7: Dead Code Detection**
Read [references/dead-code.md](references/dead-code.md) before starting this analysis.
Identify code the PR introduces as dead or makes dead by orphaning existing code.
- **Newly introduced dead code**: functions, types, exports, variables, parameters, branches added by the PR that have zero callers/readers in the codebase. Grep for every new symbol.
- **Code orphaned by the PR**: the PR changes call sites, removes features, or refactors logic — check if old functions, constants, imports, types, or error handlers lost their last consumer.
- **Commented-out code**: blocks commented out instead of deleted. Version control exists for history.
- **Unreachable code**: code after unconditional return/break/panic/throw, impossible branches, exhaustive switches with redundant default.
- **Test dead code**: tests covering removed functionality, unused test helpers/fixtures, stale mocks for changed interfaces.
Before flagging, verify indirect reachability — interface implementations, reflection, serialization tags, framework conventions, plugin registration, and public API surface are NOT dead code. When uncertain, use `question:` severity.
Severity calibration:
- **issue:** entire functions, types, or modules with zero callers introduced or orphaned by the PR
- **suggestion:** small dead code — unused parameter, single orphaned constant, commented-out block
- **nit:** arguable cases with plausible indirect reachability
- **thought:** pre-existing dead code adjacent to the PR's changes (not the PR's fault)
- **Use Research Brief:** Use "Callers & Consumers" as primary input — cross-reference every new/modified symbol against its caller count. Zero callers on a non-public, non-interface symbol = dead code. Check "Existing Patterns" for framework conventions that create indirect reachability. Use "Git History" to identify recently removed features whose cleanup may be incomplete.
Each agent returns findings as conventional comments (see format below).
### Synthesis
After all agents return, merge findings and:
1. Deduplicate overlapping concerns
2. Rank by severity (blocking first)
3. Add cross-cutting observations that only emerge from seeing all dimensions together
4. Identify if a design doc/RFC should have preceded this PR (thresholds: public API change, new service, new infrastructure, data model for stateful system, security-sensitive subsystem)
5. **Note where research changed finding severity** — e.g., "47 callers makes this breaking change blocking" or "no callers found, downgrading to suggestion". Cite specific codebase evidence (file paths, caller counts, pattern matches) that grounded the assessment.
6. **Validate blocking findings** — for each blocking issue, verify the evidence by re-reading the relevant code. If evidence is ambiguous or based on assumptions, downgrade to `question:` and request clarification from the author. Repeat until every blocking issue has verified codebase evidence.
### Humanize Comments
After synthesis, humanize the prose in all review comments to remove AI writing patterns. This makes review comments sound like they were written by a human staff engineer, not generated by a model.
Spawn a single `Agent` (subagent_type: "general-purpose") with this prompt:
You are a prose editor. Your job: rewrite review comment messages to remove AI writing patterns while preserving technical accuracy and the exact structural format.
PATTERN CATALOG: Read the file at ${CLAUDE_SKILL_DIR}/../../../humanize/skills/humanize/references/patterns.md COMPOSITION GUIDE: Read the file at ${CLAUDE_SKILL_DIR}/../../../humanize/skills/humanize/references/elements-of-style.md
REVIEW TEXT: <review> {paste the full synthesized review markdown here} </review>
RULES:
path/to/file:line
Replace the review text with the agent's output. If the agent fails or returns malformed output, proceed with the original unhurried text.
### Save Review
After humanizing completes, persist the full review markdown for reference.
1. Resolve the persistent data directory:
```bash
DATA_DIR="${XDG_DATA_HOME:-$HOME/.local/share}/sai/staff-code-review/reviews"
mkdir -p "$DATA_DIR"
/pull/{number} segment)"local" as prefixdate -u +%Y%m%dT%H%M%S${DATA_DIR}/{pr_number}-{timestamp}.md using the Write toolSkip this phase entirely if the input was NOT a GitHub PR URL.
Create a PENDING (draft) review on the PR with inline comments. The review is invisible to the PR author until manually submitted from GitHub's UI — you can edit, delete, or add comments before publishing.
Parse owner, repo, and PR number from the URL (e.g., https://github.com/{owner}/{repo}/pull/{number})
Extract inline comments from the saved review file using the parsing script:
COMMENTS_JSON=$(python3 "${CLAUDE_SKILL_DIR}/scripts/parse_review_comments.py" "$REVIEW_FILE")
The script extracts comments with *Location:* annotations and filters to: all blockers, all issues, up to 5 suggestions, and up to 3 praises. Questions, thoughts, and nits are excluded from inline comments.
Write a concise top-level review body (2-4 sentences max). Do NOT mention the tool, "staff code review", or that this was AI-generated. Write it as a human reviewer would:
Example: "Looks good overall — clean separation of concerns and solid test coverage. The missing timeout on the HTTP client (line 42) needs fixing before merge. 1 blocker, 3 suggestions."
Build the API payload and submit:
python3 -c "
import json, sys
body = sys.argv[1]
comments = json.loads(sys.argv[2])
for c in comments:
c.setdefault('side', 'RIGHT')
payload = {'body': body, 'comments': comments}
print(json.dumps(payload))
" "$REVIEW_BODY" "$COMMENTS_JSON" | gh api "repos/${OWNER}/${REPO}/pulls/${PR_NUMBER}/reviews" --input -
The payload omits the event field, which makes the API create a PENDING/draft review.
Report to the user:
If the API call fails:
Do NOT re-run the API call after it exits 0. The review creation is atomic — all comments are attached in a single request.
Use conventional comments with explicit severity. Every comment MUST follow this exact two-line structure — the parsing script depends on it for GitHub review posting:
**{label}:** {message}
*Location:* `{path/to/file}:{line_number}`
Rules:
**{label}:** on the first line, followed by the message on the same line (may wrap to additional lines)*Location:* on its own line immediately after the message, with backtick-wrapped path:line./*Location:* line — comments without location cannot be posted as inline GitHub review comments*Location:* — these go in the review body insteadLabels and when to use them:
| Label | Blocking? | When to use |
|---|---|---|
blocking: | Yes | Security vulnerabilities, architectural violations that spread, breaking API contracts without migration, race conditions, missing observability on critical paths |
issue: | Yes | Correctness bugs, missing error handling on critical paths |
question: | Soft | Need clarification before deciding — "why was this approach chosen?" |
suggestion: | No | Alternative approach with rationale, author decides |
thought: | No | Non-blocking idea, educational — pattern that matters at scale |
nit: | No | Style/naming beyond linter scope, trivial polish |
praise: | No | Acknowledge good work — especially important for teaching and culture |
The threshold for blocking: only block when code will degrade system health, create security risk, or break contracts. Google's standard: approve once the change "definitely improves overall code health" — not when perfect.
[Mental model questions answered — 2-3 sentences on overall fit] [Flag if design doc/RFC should have preceded this] [Flag if PR should be split]
[Structured findings from codebase exploration — callers, patterns, tests, history, architecture context]
Verdict: [APPROVE | APPROVE_WITH_COMMENTS | REQUEST_CHANGES] Blocking issues: [count] Non-blocking suggestions: [count]
[Only if any exist. These must be resolved before merge.]
[Findings from Agent 1]
[Findings from Agent 2]
[Findings from Agent 3]
[Findings from Agent 4]
[Findings from Agent 5]
[Findings from Agent 6]
[Findings from Agent 7]
[Insights that emerge from seeing all dimensions together]
[Explicit praise — call out good patterns, especially ones others should learn from]
</example>
## Calibration guidance
These principles prevent common staff-review anti-patterns:
- **Don't block on preference.** If the author's approach is valid and yours is also valid, approve with a suggestion. "Multiple valid approaches should defer to the author's choice if equally defensible" (Google).
- **Don't scope-creep the review.** Review what's in the PR, not what you wish was in the PR. Pre-existing issues get `thought:` comments, not blocks.
- **Don't nit-bomb.** If you have >5 nits, that's a signal to invest in linter rules, not block a PR.
- **Explain the why.** "This is wrong" is not a review comment. "This creates a cascading failure when service X is unavailable because..." is.
- **Praise explicitly.** Good patterns reinforced in review get adopted by others. Call out well-designed error handling, clean abstractions, good test coverage.
- **Suggest taking it offline** if a discussion exceeds 2-3 back-and-forths on one point. Document resolution in PR.
## When to recommend a design doc/RFC before code
Flag (don't block, but strongly recommend) when the PR:
- Changes a public API or cross-team contract
- Introduces a new service or major decomposition
- Adds new infrastructure or third-party dependency
- Affects data model for a stateful system
- Touches security-sensitive subsystem (auth, payment, encryption)
- Creates a pattern that will be used as a template by others
## Reference
- Review dimensions overview: [references/review-dimensions.md](references/review-dimensions.md)
- Performance & scalability deep reference: [references/performance-scalability.md](references/performance-scalability.md)
- Backward compatibility deep reference: [references/backward-compatibility.md](references/backward-compatibility.md)
- Convention conformance deep reference: [references/convention-conformance.md](references/convention-conformance.md)
- Dead code detection deep reference: [references/dead-code.md](references/dead-code.md)