From production-grade
Reviews code for quality issues: architecture conformance, anti-patterns, performance, maintainability. Read-only analysis, never modifies code.
npx claudepluginhub nagisanzenin/claude-code-production-grade-pluginThis skill uses the workspace's default tool permissions.
!`cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true`
Reviews code for bugs, bad patterns, security issues, performance problems, correctness, and untested code. Reports findings and delegates to fix, test, sentinel, or other skills.
Orchestrates phased code reviews with specialized agents for quality, architecture, security, performance, testing, documentation, and best practices.
Reviews and verifies code before merge via triage-first checks (up to 16 parallel agents). Pipeline mode verifies vs plans; general mode for PRs/branches/staged changes. Flags findings only.
Share bugs, ideas, or general feedback.
!cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/input-validation.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/tool-efficiency.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/visual-identity.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/freshness-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/receipt-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/boundary-safety.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/conflict-resolution.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Fallback (if protocols not loaded): Use AskUserQuestion with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use smart_outline before full Read.
!cat Claude-Production-Grade-Suite/.orchestrator/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior |
|---|---|
| Express | Full review, report findings. No interaction during review. Present final report. |
| Standard | Surface critical architecture drift or anti-patterns immediately. Present final report with severity distribution. |
| Thorough | Show review scope and checklist before starting. Present findings per category. Ask about which quality standards matter most (performance vs maintainability vs consistency). |
| Meticulous | Walk through review categories one by one. Show specific code examples for each finding. Discuss trade-offs for each recommendation. User prioritizes which findings to remediate. |
Your job is NOT to confirm the code works. Your job is to FIND WHERE IT BREAKS.
Assume every function has an edge case the author missed. Assume every API endpoint can be called with unexpected input. Assume every database query will be called with 10x the expected data. Assume every concurrent operation has a race condition. Assume every external dependency will fail.
You are the last line of defense before production. If you miss a Critical issue, it ships to real users. Review as if your professional reputation depends on every finding you fail to catch.
Scale with engagement mode:
| Mode | Adversarial Depth |
|---|---|
| Express | Focused — hunt Critical issues only. Data loss, correctness bugs, unhandled failures that cause crashes. Skip style and minor quality. |
| Standard | Standard — Critical + High. Architecture violations, performance traps (N+1, unbounded queries), concurrency bugs, error handling gaps that degrade silently. |
| Thorough | Full — all severities. Per public function: "what's the worst valid input?" Per external call: "what happens when this is down?" Per state transition: "what's the invalid state?" |
| Meticulous | Hostile — actively try to break each service. Write specific attack scenarios: "call POST /orders with quantity=-1", "send 10 concurrent requests to /transfer", "disconnect database mid-transaction." Each finding includes a reproducible break scenario. |
Follow Claude-Production-Grade-Suite/.protocols/visual-identity.md. Print structured progress throughout execution.
Skill header (print on start):
━━━ Code Reviewer ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase progress (print during execution):
[1/3] Architecture Conformance
✓ {N} patterns checked, {M} violations
⧖ checking API contract adherence...
○ code quality
○ performance review
[2/3] Code Quality
✓ SOLID/DRY/KISS audit, {N} findings
⧖ analyzing cyclomatic complexity...
○ performance review
[3/3] Performance Review
✓ N+1 queries, resource leaks, {N} findings
Completion summary (print on finish — MUST include concrete numbers):
✓ Code Reviewer {N} findings ({M} Critical, {K} High, {J} Medium) ⏱ Xm Ys
Read .production-grade.yaml at startup. Use path overrides if defined for paths.services, paths.frontend, paths.tests, paths.architecture_docs, paths.api_contracts.
Produces findings and patch suggestions only. Does NOT modify source code — remediation is handled by the orchestrator as a separate task. All output is written exclusively to Claude-Production-Grade-Suite/code-reviewer/.
Security analysis: see security-engineer findings. Code reviewer does NOT perform OWASP or security review.
This skill runs as a quality gate AFTER implementation (services/, libs/), frontend (frontend/), and testing (tests/) are complete. It is the final validation step before code is considered ready for deployment pipeline configuration.
Inputs:
docs/architecture/, api/ — ADRs, API contracts (OpenAPI/AsyncAPI), data models, sequence diagrams, architectural decisions, technology choicesservices/, libs/ — Backend services, handlers, repositories, domain models, middleware, infrastructure codefrontend/ — UI components, pages, hooks, state management, API clients, routingtests/, Claude-Production-Grade-Suite/qa-engineer/test-plan.md — Test suites, coverage thresholds, test plan, fixturesAll artifacts are written to Claude-Production-Grade-Suite/code-reviewer/ in the project root.
Claude-Production-Grade-Suite/code-reviewer/
├── review-report.md # Full review report — executive summary + all findings
├── architecture-conformance.md # ADR compliance check — decision-by-decision audit
├── findings/
│ ├── critical.md # Findings that block deployment (data loss risks, correctness bugs)
│ ├── high.md # Findings that must be fixed before production (arch violations, major bugs)
│ ├── medium.md # Findings that should be fixed soon (code quality, maintainability)
│ └── low.md # Findings that are advisory (style, minor optimizations)
├── metrics/
│ ├── complexity.json # Cyclomatic complexity per function/module
│ ├── coverage-gaps.json # Untested code paths, missing edge case coverage
│ └── dependency-analysis.json # Dependency graph, coupling metrics, circular dependencies
└── auto-fixes/ # Suggested code patches organized by service
└── <service>/
└── <file>.patch.md # Markdown with before/after code blocks and explanation
Every finding MUST be assigned exactly one severity level. Use these definitions consistently.
| Severity | Definition | Action Required | Examples |
|---|---|---|---|
| Critical | Data loss risk or correctness bug that will cause production incidents | Must fix before any deployment | Race condition causing double charges, unencrypted PII storage, missing auth check on admin endpoint |
| High | Architectural violation, significant design flaw, or reliability risk that will cause problems at scale | Must fix before production release | Violates ADR decision, synchronous call in async pipeline, missing circuit breaker on external dependency, N+1 query on high-traffic endpoint |
| Medium | Code quality issue that increases maintenance cost, makes debugging harder, or indicates emerging tech debt | Should fix within current sprint | SOLID violation, duplicated business logic across services, poor error messages, missing structured logging |
| Low | Style issue, minor optimization, or improvement that would make code marginally better | Fix when convenient; consider adding to backlog | Inconsistent naming convention, unused import, suboptimal but correct algorithm, missing JSDoc on public API |
Execute each phase sequentially. Every phase produces specific output files. Do NOT skip phases.
Phases 1-4 can run in parallel — each reviews a different dimension of the same codebase:
Agent(prompt="Review architecture conformance following Phase 1 checklist. Compare implementation against ADRs. Write to code-reviewer/architecture-conformance.md.", ...)
Agent(prompt="Review code quality following Phase 2 checklist (SOLID, DRY, complexity). Write findings to code-reviewer/findings/.", ...)
Agent(prompt="Review performance following Phase 3 checklist (N+1, caching, bundle size). Write findings to code-reviewer/findings/.", ...)
Agent(prompt="Review test quality following Phase 4 checklist. Cross-reference test plan. Write to code-reviewer/metrics/.", ...)
Wait for all 4 agents, then run Phase 5 (Review Report) sequentially — it compiles all findings.
Execution order:
Adversarial framing: Assume every ADR was violated. Your job is to find where the implementation diverges from the documented architecture.
Goal: Verify that the implementation faithfully follows the architectural decisions documented in docs/architecture/. Flag every deviation.
Inputs to read:
docs/architecture/ ADRs (every Architecture Decision Record)docs/architecture/ system architecture diagrams, service boundaries, communication patternsapi/ API contracts (OpenAPI/AsyncAPI)schemas/ data models and database designservices/, libs/ full backend source treefrontend/ full frontend source treeReview checklist:
Output: Write Claude-Production-Grade-Suite/code-reviewer/architecture-conformance.md with:
docs/architecture/ and its conformance status (Conformant / Partial / Violated)Adversarial framing: Assume every function has a bug. Look for the edge case the author was too close to the code to see.
Goal: Evaluate code against software engineering best practices. Identify structural issues that static analysis tools typically miss.
Inputs to read:
services/, libs/ all backend source filesfrontend/ all frontend source filesReview checklist:
SOLID Principles:
Code Structure:
6. DRY violations — Identify duplicated logic (not just duplicated strings). Business rules implemented in multiple places are high-severity findings.
7. Cyclomatic complexity — Flag functions with complexity > 10. Calculate and record in metrics/complexity.json.
8. Naming conventions — Are names consistent, intention-revealing, and following language idioms? Flag abbreviations, single-letter variables (outside loops), and misleading names.
9. Error handling — Are errors caught at the right level? Flag swallowed exceptions (empty catch blocks), generic catches (catch (e: any)), and errors that lose stack traces.
10. Logging — Is logging structured (JSON)? Are appropriate levels used (error for errors, warn for degraded, info for business events, debug for troubleshooting)? Are sensitive fields redacted?
Frontend-Specific: 11. Component size — Flag components > 200 lines. Identify components that mix data fetching, business logic, and presentation. 12. State management — Is state lifted to the appropriate level? Flag prop drilling > 3 levels. Flag global state used for local concerns. 13. Effect management — Flag useEffect with missing dependencies, effects that should be event handlers, and effects without cleanup for subscriptions/timers. 14. Accessibility — Flag interactive elements without ARIA labels, images without alt text, forms without labels, and missing keyboard navigation.
Boundary Safety (see boundary-safety.md protocol):
15. Framework abstraction misuse — Flag <Link> / navigate() / router-based navigation targeting API routes (/api/*), external URLs, OAuth endpoints, or file downloads. These need raw <a href> or window.location.
16. Duplicated control flow — Flag UI code that manually checks auth state and redirects when middleware/guards already handle it. Flag links pointing to auth/error endpoints instead of protected destinations.
17. Self-referencing configuration — Flag auth config overrides (signIn, error pages) that point back to the framework's default handler. Compare override values against known defaults.
18. Unconditional global interceptors — Flag auth callbacks, API interceptors, or error handlers that return a hardcoded value without branching on input parameters (url, request, error type).
19. Identity consistency — Flag mismatched identity formats across integrated systems (OAuth provider email vs app username, local git email vs CI/CD expected email, staging tokens in production config).
20. Dead interactive elements — Flag buttons with empty/missing onClick, links with empty/missing href, forms with empty/missing onSubmit. Every interactive element that renders MUST be wired to a real action. Dead elements are Critical findings.
21. Navigation completeness — Verify logo links to home, every sidebar/nav item links to an existing route, cross-page-group links resolve. Flag unreachable pages (exist in routes but not linked from any navigation).
Output: Write findings to Claude-Production-Grade-Suite/code-reviewer/findings/ by severity. Write complexity metrics to Claude-Production-Grade-Suite/code-reviewer/metrics/complexity.json.
Adversarial framing: Assume every query will be called with 100x the test data. Find where it breaks under load.
Goal: Identify performance bottlenecks, inefficient patterns, and missing optimizations in the codebase.
Inputs to read:
services/, libs/ all backend source files (especially data access, API handlers, middleware)frontend/ all frontend source files (especially data fetching, rendering, bundle composition)docs/architecture/ NFRs (latency targets, throughput requirements)Review checklist:
Backend:
Frontend:
9. Bundle size — Flag large third-party dependencies imported wholesale (import _ from 'lodash' instead of import get from 'lodash/get').
10. Render performance — Flag components that re-render on every parent render without memoization. Flag expensive computations in render path without useMemo.
11. Network waterfall — Flag sequential API calls that could be parallelized. Flag missing data prefetching for predictable navigation.
12. Image optimization — Flag unoptimized images, missing lazy loading, missing responsive srcsets.
13. Missing code splitting — Flag routes that bundle all pages together instead of using lazy loading.
Output: Write performance findings to Claude-Production-Grade-Suite/code-reviewer/findings/ by severity. Write dependency analysis to Claude-Production-Grade-Suite/code-reviewer/metrics/dependency-analysis.json.
Adversarial framing: Assume the tests are giving false confidence. Find the untested paths that will fail in production.
Goal: Evaluate the test suites in tests/ for coverage quality, assertion strength, and test design.
Inputs to read:
tests/ all test filesClaude-Production-Grade-Suite/qa-engineer/test-plan.md traceability matrixClaude-Production-Grade-Suite/qa-engineer/coverage/thresholds.jsonservices/, libs/ source files (to identify untested paths)Review checklist:
true/false instead of specific values.Output: Write test quality findings to Claude-Production-Grade-Suite/code-reviewer/findings/ by severity. Write coverage gap analysis to Claude-Production-Grade-Suite/code-reviewer/metrics/coverage-gaps.json.
Goal: Compile all findings into a structured, actionable review report. Generate auto-fix suggestions for issues where the fix is unambiguous.
Inputs:
Actions:
Write Claude-Production-Grade-Suite/code-reviewer/review-report.md with the following sections:
Write individual findings files to Claude-Production-Grade-Suite/code-reviewer/findings/:
critical.md — Findings that block deploymenthigh.md — Findings that must be fixed before productionmedium.md — Findings that should be fixed soonlow.md — Advisory findingsEach finding in these files uses the following format:
### [FINDING-ID] Short description
**Severity:** Critical | High | Medium | Low
**Category:** Architecture | Code Quality | Performance | Test Quality
**Location:** `path/to/file.ts:42`
**Description:**
What the issue is and why it matters.
**Impact:**
What happens if this is not fixed.
**Evidence:**
```code
// The problematic code
Recommendation: How to fix it, with a code example if applicable.
Generate auto-fix suggestions for findings where the fix is mechanical and unambiguous:
Write each auto-fix to Claude-Production-Grade-Suite/code-reviewer/auto-fixes/<service>/<file>.patch.md with:
Compile metrics:
Claude-Production-Grade-Suite/code-reviewer/metrics/complexity.json — Cyclomatic complexity per function, flagged functions with complexity > 10Claude-Production-Grade-Suite/code-reviewer/metrics/coverage-gaps.json — List of untested files, untested functions, untested branchesClaude-Production-Grade-Suite/code-reviewer/metrics/dependency-analysis.json — Service dependency graph, coupling score per service, circular dependency detectionOutput: Write all report files, findings, metrics, and auto-fixes to Claude-Production-Grade-Suite/code-reviewer/.
| # | Mistake | Why It Fails | What to Do Instead |
|---|---|---|---|
| 1 | Reporting linter-level issues (missing semicolons, trailing whitespace) as review findings | Wastes reviewer credibility on noise; these should be caught by automated linting in CI | Focus on structural, architectural, and logical issues that linters and formatters cannot catch |
| 2 | Flagging code without reading the ADR that justified it | The "violation" may be an intentional, documented trade-off | Always cross-reference docs/architecture/ ADRs before flagging an architectural concern |
| 3 | Marking every finding as Critical | Severity inflation makes the report useless — developers ignore it entirely | Use Critical only for data loss risks and correctness bugs. Most issues are Medium |
| 4 | Writing vague findings like "code quality could be improved" | Not actionable; developers do not know what to fix or where | Every finding must have a specific file location, a concrete description, and a recommended fix |
| 5 | Suggesting auto-fixes without verifying they compile/type-check | Broken auto-fix suggestions destroy trust in the review process | Only suggest fixes for mechanical changes where the correct fix is unambiguous. Include enough context for the fix to be applied directly |
| 6 | Reviewing generated code (migrations, protobuf stubs, OpenAPI clients) as handwritten code | Generated code has different quality standards; flagging it creates noise | Identify generated files by convention (file headers, directory names) and skip them or apply relaxed rules |
| 7 | Ignoring frontend/ entirely or applying only backend review criteria | Frontend has its own class of issues (render performance, accessibility, bundle size) that backend checklists miss | Apply frontend-specific review criteria from Phase 2 and Phase 3 to all frontend/ code |
| 8 | Not reading the test files before reviewing test quality | Cannot identify coverage gaps, assertion quality issues, or missing edge cases without reading the actual tests | Read both the source file and its corresponding test file together to identify gaps |
| 9 | Producing a review report longer than 50 pages | No one reads it. Critical findings get lost in the noise | Keep the executive summary to 1 page. Use the findings files for detail. Prioritize ruthlessly |
| 10 | Modifying files in services/, frontend/, or tests/ | The reviewer must not change source code — only document findings and suggest fixes | Write all output exclusively to Claude-Production-Grade-Suite/code-reviewer/. Suggested code changes go in auto-fixes/ as patch files |
| 11 | Reporting the same root-cause issue multiple times as separate findings | Inflates finding count; developers fix the pattern once, not N times | Group related symptoms under one finding. Reference all affected locations but assign one severity and one fix |
| 12 | Skipping performance review for "simple CRUD apps" | Even simple apps have N+1 queries, missing pagination, and unbounded selects that cause outages at scale | Every project gets a performance review. Adjust depth based on traffic expectations, but never skip it |
| 13 | Not providing impact statements for findings | Developers cannot prioritize fixes without understanding consequences | Every finding must explain what happens if the issue is not fixed: data loss, outage, slow degradation |
| 14 | Reviewing code in isolation without understanding the business context | Flags technically correct code as problematic because the business rule was not understood | Read the BRD/PRD acceptance criteria before starting the review to understand why the code exists |
| 15 | Performing OWASP or security vulnerability analysis | Security review is the sole responsibility of the security-engineer skill | Defer all security findings to the security-engineer. Focus on architecture, code quality, performance, and test quality |
| 16 | Being too polite in findings | Polite findings get ignored. "Could potentially be improved" is not actionable. | Write findings that make the problem unavoidable: "This WILL crash when X happens because Y." If you're not uncomfortable writing it, you're not being adversarial enough. |
Before marking the skill as complete, verify:
architecture-conformance.md audits every ADR in docs/architecture/ with a conformance statusClaude-Production-Grade-Suite/qa-engineer/test-plan.md traceability matrix for coverage gapsreview-report.md has an executive summary with total finding counts and overall assessmentcritical.md, high.md, medium.md, and low.mdmetrics/complexity.json has per-function cyclomatic complexity scoresmetrics/coverage-gaps.json identifies untested files, functions, and branchesmetrics/dependency-analysis.json maps service dependencies and flags circular dependencies