From code-reviewer
Code-review methodology for detecting over-engineering, gold-plating, and scope creep — places where the implementation does meaningfully more than the linked task, bug, or PR title actually required. Compares delivered diff to stated intent, not raw complexity in isolation. Particularly catches what AI- and LLM-generated PRs tend to produce: speculative abstractions for hypothetical futures, single-use helper extractions, defensive code for impossible scenarios, drive-by refactors bundled with bug fixes, premature optimization without measurement, unused configuration hooks, excessive logging, tutorial-style comments, unrequested features, and duplicate code paths added next to existing ones. Use whenever a reviewer says or thinks the diff feels too big for what was asked — e.g., "this bug fix introduces a new interface for one impl", "the AI added five helpers used once each", "fix login bug PR has 14 files", "is this YAGNI?", or any audit of AI-generated code against an original prompt. NOT for general code quality, performance, security, exception handling, duplicate-finding, EUII, architecture, test generation, logging setup, or implementing work items — those have dedicated tools.
How this skill is triggered — by the user, by Claude, or both
Slash command
/code-reviewer:over-engineering-reviewThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is the working methodology used by the `over-engineering-review` agent and any
This skill is the working methodology used by the over-engineering-review agent and any
other reviewer that needs to compare delivered code to requested scope. It catalogues the
ten patterns of LLM and developer over-achievement, with detection signals, severity
guidance, and explicit "what NOT to flag" rules for each.
The cheapest line of code is the one you don't write. Every line that lands in the codebase imposes a perpetual tax on future readers, refactorers, and reviewers. Over-engineering is the failure mode where extra code lands without a current need — its tax is paid every day, its benefit is hypothetical and often never realized.
A good implementation matches the task's actual shape. Smaller, more focused PRs land faster, review faster, regress less, and roll back more cleanly. When a reviewer flags over-engineering they are not being pedantic — they are protecting the team from a slow accumulation of unjustified complexity.
Invoke this methodology when:
Each category includes: detection signals, what the LLM/dev was probably trying to do, why it's a problem, severity guidance, and "do NOT flag" exclusions.
A bug fix or small feature PR that also reformats, renames, or restructures unrelated code in passing. "While I was in there..."
Detection signals:
Why it's a problem:
Severity: MEDIUM by default. HIGH if the drive-by changes touch shared infrastructure or files used by many other callers (regression blast radius). LOW if the drive-by is purely local to a file already being meaningfully changed for the task.
Recommendation pattern: "Move this rename/reformat to its own follow-up PR (or commit) so the bug fix can be reviewed and reverted independently."
Do NOT flag:
using
statement, threading a parameter through an existing chain).A new interface, abstract class, generic type, or strategy/factory pattern introduced for a hypothetical future need that the current task does not require.
Detection signals:
// Other foos can extend this in the future.<T> generic where every existing call site uses the same concrete type and there is no
documented case for a second.Why it's a problem:
Severity: MEDIUM by default. HIGH when the abstraction is exposed across module boundaries (other modules now have to depend on the interface, locking the design in). LOW when the abstraction is purely internal to a class or file (easily refactored later).
Recommendation pattern: "Inline the concrete implementation. When a second use case appears, the right abstraction will be obvious from the two concrete cases — and may differ from this one. Three similar lines of code is better than a premature abstraction."
Do NOT flag:
Null checks, try/catch blocks, validation, retry logic, or input sanitization for scenarios that cannot occur given the call graph.
Detection signals:
if (foo == null) throw ... on a parameter whose only callers pass it directly from a
constructor or a non-nullable type.try { ... } catch (Exception ex) { _logger.LogError(...); throw; } blocks that swallow
no exceptions and add no recovery — pure noise.Why it's a problem:
Severity: MEDIUM by default. LOW when the defensive code is at a public API or trust-boundary entry point (validation at boundaries is correct — flag only that it's duplicating boundary validation done elsewhere).
Recommendation pattern: "Remove the null check / try-catch / retry — the only callers [name them] cannot trigger this branch. If a future caller does, the call chain should fail visibly so we discover the real coupling rather than silently swallowing it."
Do NOT flag:
ArgumentNullException
on every public method) — flag only deviations from convention, not adherence.Caching layers, batching, parallelization, custom data structures, or algorithmic tricks introduced without a measured performance need.
Detection signals:
IMemoryCache or similar caching layer in a code path that handles low-frequency
operations.foreach to Parallel.ForEach or await Task.WhenAll without the
workload size justifying it.// optimize this later if it becomes a bottleneck next to code that is
already optimized.Why it's a problem:
Severity: MEDIUM by default. HIGH when the optimization adds significant code complexity
(custom data structures, threading) without measurement. LOW when it's a one-line idiomatic
choice (e.g., HashSet instead of List for membership tests) consistent with the codebase.
Recommendation pattern: "Revert to the straightforward implementation. If a profiler or production metric later shows this hot path is expensive, the right optimization will be informed by real data. Performance-tune by measurement, not by anticipation."
Do NOT flag:
Any() instead of Count() > 0,
preferring StringBuilder for repeated concatenation in a hot loop).The PR delivers functionality that wasn't asked for in the task description.
Detection signals:
Why it's a problem:
Severity: HIGH or BLOCKER for new public API surface (endpoints, exported functions, schema changes, CLI flags). MEDIUM for internal features. LOW only when the addition is genuinely tiny and the reviewer is confident it won't matter.
Recommendation pattern: "Move the [unrequested feature] to its own PR linked to its own ticket. The current PR should ship only the work the task asked for. If the feature is genuinely valuable, file a ticket so it can be prioritized and reviewed on its own merits."
Do NOT flag:
Logging statements, metrics, or telemetry events added at every step of a code path when the task didn't request observability work and the surrounding code is sparingly logged.
Detection signals:
_logger.LogInformation at entry, between every step, and at exit — for
a method that is purely a pass-through.metrics.Increment(...)) added next to logic that didn't have any
before, in a codebase that doesn't generally instrument that layer.Information or Debug level for events that are not actionable.Why it's a problem:
Severity: MEDIUM by default. HIGH when the new logs include EUII (handed to the
euii-leak-detector agent for that aspect; you flag the volume aspect). LOW when the new
logs are at the right level and just slightly redundant.
Recommendation pattern: "Pare back to just the entry/exit log of the public boundary, or to the specific failure paths that need tracing. Verbose logging is a debugging tool, not a default — it should be added when there's a known reason to trace this path, not preemptively."
Do NOT flag:
Comments that explain what well-named code already says, written as if for a reader who doesn't know the language.
Detection signals:
// Increment the counter above counter++.// Loop through the items above foreach (var item in items).// Calculate the total cost of the order above decimal CalculateOrderTotalCost(...).Why it's a problem:
Severity: LOW by default — these don't change behavior. MEDIUM only when comments are dense enough to actually obscure the code (40%+ of the lines are comments saying nothing).
Recommendation pattern: "Delete these comments. Well-named identifiers and small functions are the documentation. Reserve comments for the why of non-obvious choices: a workaround for a specific bug, a hidden invariant, a constraint that would surprise a reader."
Do NOT flag:
comment-analyzer.)A method, function, or local helper extracted from code that is called from only one place, where inlining would be clearer.
Detection signals:
Why it's a problem:
Severity: LOW by default. MEDIUM when the extracted helper is in a separate file or class (which makes the indirection more painful) or when the extraction split closely-related logic across multiple methods that have to be read together to make sense.
Recommendation pattern: "Inline this helper at the single call site. If a second use case appears, extract it then — the right shape will be informed by both concrete uses."
Do NOT flag:
var result = ComputeRiskAdjustedScore(input);
is documenting intent, not just decomposing).New configuration options, feature flags, environment variables, or options classes that nothing in the codebase reads.
Detection signals:
IOptions<T> class that no consumer references.if (_features.IsEnabled("foo"))) where the flag's other branch
is unreachable or returns the same result.appsettings.json and the options class — no
code path actually consumes it.Why it's a problem:
Severity: MEDIUM by default. HIGH when the hook is part of a public API or schema (it becomes external-facing and harder to remove). LOW when the hook is purely internal and the PR clearly intends to wire it up in a follow-up (verify by checking the work item).
Recommendation pattern: "Remove the unused hook. Add it back when the consumer is also landing — preferably in the same PR so the hook and its consumer can be reviewed together."
Do NOT flag:
The PR adds a new method, class, or code path next to an existing one with similar behavior, instead of extending the existing one.
Detection signals:
GetUserByIdV2(...) next to existing GetUserById(...) where the difference
is small and the existing method has only one or two callers.OrderProcessorWithFoo next to OrderProcessor where the difference is one
feature toggle.if (request.IsNewFlow) NewFlow(); else OldFlow();) where
both flows share substantial implementation.Why it's a problem:
Severity: HIGH by default — duplicate paths create lasting bifurcation. MEDIUM when the duplication is purely additive and the original path has no callers being modified (the new path is genuinely a new feature with coincidental similarity).
Recommendation pattern: "Extend the existing method/class instead of adding a parallel one. If the existing implementation can't accommodate the new requirement cleanly, refactor it first (as its own PR or commit) and then add the new behavior on top of the refactored version."
Do NOT flag:
User repository and an
Order repository — same shape, different domains, correctly separate)./v1/users and /v2/users exist for
contract compatibility).When reviewing, walk the diff once for each of the ten categories — most reviews touch only two or three categories per PR. For each finding, populate the agent's output template with:
If you find yourself unable to pick a category, you may be looking at a different concern.
Double-check it isn't owned by code-simplifier (block-level complexity), class-design-simplifier
(class-level abstract complexity), architecture-review (system-level structure), or
duplicate-code-detector (duplicate code). The over-engineering lens is specifically about
delivered scope vs. stated scope.
Before publishing your findings, sanity-check the anchor:
| Anchor source | Confidence | Posture |
|---|---|---|
| Work item with detailed acceptance criteria | HIGH | Confidently flag deviations from the criteria |
| Work item title only | MEDIUM | Flag obvious overruns; emit [QUESTION] for borderline cases |
| PR title + description | MEDIUM | Same as above |
| Commit messages only | LOW | Flag only egregious overruns; lean on YAGNI lens |
| No anchor | LOW | YAGNI-only review; explicitly note the missing anchor |
Always state the anchor and your confidence level in the summary block. The reviewer needs this metadata to decide how seriously to weight the findings.
IFooStrategy interface has
one implementation, the work item describes a single fixed strategy, and there is no second
variant on the roadmap — inline the concrete DefaultFooStrategy until a second use case
appears" is useful.[QUESTION] for borderline cases; reserve findings for
cases you can defend with the anchor.npx claudepluginhub gautam-achieveai/claudeplugins --plugin code-reviewerProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.