By devdanzin
Codebase exploration and analysis agents for existing code: architecture mapping, git history analysis, fix propagation detection, churn-based risk assessment, consistency auditing, complexity analysis, test coverage, error handling, documentation, project documentation accuracy, type design, dead code detection, tech debt inventory, pattern consistency, and API surface review
npx claudepluginhub devdanzin/code-review-toolkit --plugin code-review-toolkitComprehensive codebase exploration and analysis using specialized agents
Quick health dashboard — all agents in summary mode
Find cleanup targets — complexity hotspots, dead code, and tech debt
Quick architecture mapping — understand project structure and dependencies
Use this agent to review the public API surface of a Python project — naming consistency, parameter conventions, return type patterns, and whether the API is predictable and learnable from a user's perspective. Especially valuable for tools and libraries that have both a CLI interface and a programmatic API. This agent evaluates whether a user who learns one part of the API can predict how other parts work. <example> Context: The user is preparing a release and wants the API to be polished. user: "Before the release, can you review our public API for consistency?" assistant: "I'll use the api-surface-reviewer to evaluate naming, parameter conventions, and API predictability." </example> <example> Context: The user wants to understand how intuitive the API is. user: "If someone new picks up this library, will the API make sense?" assistant: "I'll use the api-surface-reviewer to assess learnability and consistency from a newcomer's perspective." </example>
Use this agent to map the structure, dependencies, and module boundaries of a Python codebase. This is the foundational analysis agent — its output feeds into other agents as context for richer analysis. Use it when exploring an unfamiliar codebase, before running other code-review-toolkit agents, or when you need to understand how modules relate to each other. The agent builds a dependency graph from Python imports, identifies module boundaries and layering, detects circular dependencies, and produces a structural summary. The agent needs to know the scope of the analysis. By default it analyzes the entire project. You can narrow scope to a directory, file, or glob pattern. <example> Context: The user wants to understand the structure of a Python project before making changes. user: "I need to understand how this codebase is organized before I start refactoring" assistant: "I'll use the architecture-mapper agent to map the module structure and dependencies." <commentary> Use architecture-mapper as the first step in codebase exploration. Its output gives a mental model of the project. </commentary> </example> <example> Context: The user wants to run a comprehensive codebase review. user: "Run a full code review on this project" assistant: "I'll start by running the architecture-mapper to understand the project structure, then feed that into the other review agents." <commentary> When running multiple agents, architecture-mapper should run first so its output can enrich other agents' analysis. </commentary> </example> <example> Context: The user suspects there are circular dependencies causing import issues. user: "I keep hitting circular import errors — can you map out the dependency structure?" assistant: "I'll use the architecture-mapper agent to build a dependency graph and identify circular dependencies." <commentary> Architecture-mapper directly addresses structural questions about how modules depend on each other. </commentary> </example>
Use this agent to find the most complex code in a Python codebase and suggest simplifications. This agent combines hotspot detection (finding where complexity concentrates) with simplification analysis (how to reduce it). It measures multiple dimensions of complexity — nesting depth, function length, parameter count, cognitive load — and produces a ranked list of hotspots with concrete simplification strategies. Use after architecture-mapper for module-aware analysis. The agent needs scope and optionally architecture-mapper output as context. <example> Context: The user wants to find and simplify the most complex parts of their codebase. user: "Where are the most complex parts of this codebase? I want to simplify them." assistant: "I'll use the complexity-simplifier to identify complexity hotspots and suggest simplifications." <commentary> This is the core use case: find what's complex, explain why, and suggest how to simplify. </commentary> </example> <example> Context: The user is planning a refactoring sprint and wants to prioritize. user: "I have a week for refactoring — where should I focus?" assistant: "I'll run the complexity-simplifier to rank the codebase by complexity so you can prioritize your refactoring effort." <commentary> The ranked hotspot output directly answers prioritization questions. </commentary> </example> <example> Context: A specific module feels hard to work with. user: "The runner module is really hard to modify — can you analyze why?" assistant: "I'll use the complexity-simplifier focused on the runner module to identify what's making it complex and how to simplify it." <commentary> Narrowing scope to a specific module for targeted analysis. </commentary> </example>
Use this agent to scan a Python codebase for inconsistencies in coding patterns, style, and conventions. Unlike a PR code-reviewer that checks changes against rules, this agent compares how different parts of the codebase handle the same concerns and identifies divergence. It works both inductively (discovering implicit conventions from the majority pattern) and deductively (checking against CLAUDE.md rules). Best used after architecture-mapper has run, so it can analyze consistency within and across module boundaries. The agent needs scope and optionally architecture-mapper output as context. <example> Context: The user wants to find inconsistencies across a codebase. user: "This codebase has grown organically — can you find where our patterns diverge?" assistant: "I'll use the consistency-auditor to scan for pattern divergence across your codebase." <commentary> The consistency-auditor is designed for exactly this: finding where organic growth has led to inconsistent patterns. </commentary> </example> <example> Context: Architecture-mapper has already run and the user wants deeper analysis. user: "Now that we have the architecture map, let's look at code consistency" assistant: "I'll feed the architecture-mapper output into the consistency-auditor for module-aware consistency analysis." <commentary> Using architecture-mapper output lets the consistency-auditor distinguish intentional variation between modules from unintentional divergence. </commentary> </example> <example> Context: The user has established coding standards and wants to verify the codebase follows them. user: "Check if the codebase actually follows what CLAUDE.md says" assistant: "I'll use the consistency-auditor to compare the codebase against the documented standards in CLAUDE.md." <commentary> The auditor does both inductive (pattern discovery) and deductive (rule checking) analysis. </commentary> </example>
Use this agent to find unused code in a Python codebase — unused imports, unreferenced functions, orphan files, unreachable branches, and stale feature flags. Dead code creates noise, increases maintenance burden, and can confuse developers into thinking unused code paths are important. This agent uses static analysis (import/reference scanning) to identify candidates, with careful attention to Python's dynamic dispatch patterns that can cause false positives. <example> Context: The user wants to clean up the codebase. user: "I think there's a lot of dead code in this project — can you find it?" assistant: "I'll use the dead-code-finder to scan for unused imports, unreferenced functions, and orphan files." </example> <example> Context: Before a major refactoring effort. user: "Before we refactor, let's remove anything that's not actually used" assistant: "I'll run the dead-code-finder to identify safe removal candidates." </example>
Use this agent to audit documentation quality across a Python codebase — docstrings, inline comments, module-level documentation, and README accuracy. It checks for stale comments, undocumented public APIs, misleading documentation, and comment rot. Unlike the PR-focused comment-analyzer, this agent surveys documentation completeness and accuracy across the entire codebase. <example> Context: The user wants to assess documentation quality. user: "How well-documented is this codebase?" assistant: "I'll use the documentation-auditor to survey docstring coverage, comment accuracy, and documentation quality." </example> <example> Context: The user is preparing to onboard a new contributor. user: "Would a new contributor be able to understand this code from the docs?" assistant: "I'll audit the documentation from a newcomer's perspective using the documentation-auditor." </example>
Use this agent to perform deep temporal analysis of a codebase. It runs LAST in the explore pipeline and uses output from all other agents alongside git history to perform fix completeness review, similar bug detection, new feature review, churn×quality risk matrix, historical context annotation of other agents' findings, and co-change coupling analysis. Its most valuable capability is finding places where the same bug pattern exists but hasn't been fixed yet. <example> Context: After a full explore run, analyzing whether recent fixes are complete. user: "We've been fixing a lot of bugs lately — are the fixes complete?" assistant: "I'll use the git-history-analyzer to review recent fix commits and check for completeness and propagation gaps." <commentary> Fix completeness review and similar bug detection are the agent's highest-value capabilities. </commentary> </example> <example> Context: A user asking about similar bugs. user: "We just fixed a null check bug — did we miss any similar bugs elsewhere?" assistant: "I'll use the git-history-analyzer to find structurally similar code that might have the same vulnerability." <commentary> Similar bug detection (fix propagation) searches the codebase for analogous patterns. </commentary> </example> <example> Context: The explore command dispatching this as the final analysis pass. user: "/code-review-toolkit:explore . all" assistant: "[As the final step in exploration, git-history-analyzer cross-references all other agents' findings with git history for temporal context.]" <commentary> This agent runs in Group E (last) to have access to all other agents' output. </commentary> </example>
Use this agent to provide temporal context for all other agents by analyzing git history. It runs the analysis script and produces churn metrics, recent change classifications, co-change clusters, and per-module stability ratings. This is the temporal counterpart to architecture-mapper — it runs first in the explore pipeline so every subsequent agent can prioritize based on change patterns. <example> Context: The explore command running this agent first to provide history context. user: "/code-review-toolkit:explore . all" assistant: "[As the first step in exploration, git-history-context analyzes recent git history to provide temporal context for all subsequent agents.]" <commentary> This agent runs in Group 0 alongside architecture-mapper, providing foundational context. </commentary> </example> <example> Context: A user wanting to understand recent change patterns before diving in. user: "Before I start working on this codebase, what's been happening recently?" assistant: "I'll use git-history-context to analyze recent commits, churn patterns, and change velocity." <commentary> The agent provides a quick temporal overview of the project's recent activity. </commentary> </example> <example> Context: A user asking what's been changing a lot lately. user: "What files have been changing the most? Where's all the churn?" assistant: "I'll use git-history-context to identify the highest-churn files and functions." <commentary> Churn hotspot identification is one of the agent's core outputs. </commentary> </example>
Use this agent to find places where a Python codebase solves the same problem in different ways. While the consistency-auditor focuses on style and convention divergence, this agent focuses on **behavioral pattern divergence** — where the same concern (configuration loading, resource cleanup, serialization, CLI argument handling, etc.) is implemented with different approaches in different modules. This is especially valuable for codebases that have grown organically over time. <example> Context: The user notices the codebase handles similar things differently in different places. user: "I feel like we handle configuration differently in every module — can you check?" assistant: "I'll use the pattern-consistency-checker to find all the ways configuration is handled and identify the divergence." <commentary> This agent excels at finding multiple implementations of the same concern scattered across a codebase. </commentary> </example> <example> Context: After architecture-mapper reveals several modules with similar responsibilities. user: "The architecture map shows three modules that all do data loading — are they consistent?" assistant: "I'll use the pattern-consistency-checker to compare how those modules approach data loading." <commentary> Architecture-mapper output helps focus the pattern analysis on modules that should be consistent. </commentary> </example>
Use this agent to audit out-of-code documentation (README, CLAUDE.md, CONTRIBUTING.md, configuration files) for accuracy against the actual codebase. Unlike the documentation-auditor (which covers in-code docstrings and comments), this agent focuses on external-facing documentation. It has three concrete, verifiable capabilities — reference validation (do things mentioned in docs exist in code?), cross-file consistency (do documentation files agree with each other and with project metadata?), and structural completeness (does the project document what it actually exposes?). Every finding is mechanically verifiable, not subjectively opinionated. <example> Context: The user is preparing a release and wants to verify documentation accuracy. user: "Before the release, can you check that our README is actually accurate?" assistant: "I'll use the project-docs-auditor to validate all references in the README against the current code and check for stale or broken documentation." <commentary> Pre-release documentation accuracy check — the core use case. The agent will find renamed functions, changed CLI flags, and outdated examples. </commentary> </example> <example> Context: The user has refactored code and wants to check if docs are stale. user: "I just renamed a bunch of functions and reorganized the modules — are the docs still correct?" assistant: "I'll use the project-docs-auditor to find any documentation references that point to the old names or structure." <commentary> Post-refactoring documentation drift detection. The agent excels at finding references to entities that no longer exist. </commentary> </example> <example> Context: The explore command dispatching this agent as part of a full review. user: "/code-review-toolkit:explore . all" assistant: "[As part of the full exploration, the project-docs-auditor checks external documentation accuracy alongside the documentation-auditor's in-code analysis.]" <commentary> When run as part of a full exploration, this agent complements the documentation-auditor by covering external-facing docs. </commentary> </example>
Use this agent to scan a Python codebase for silent failures, inadequate error handling, and inappropriate fallback behavior. Unlike the PR-focused version, this agent traverses all error handling in scope rather than just reviewing a diff. It systematically finds every try/except, bare except, swallowed exception, logging-then-continuing pattern, and missing error handling in the codebase. Best used after architecture-mapper so it can assess error handling quality relative to module importance. <example> Context: The user wants a comprehensive error handling audit. user: "Audit the error handling across the entire codebase" assistant: "I'll use the silent-failure-hunter to systematically scan all error handling patterns in the codebase." <commentary> Full codebase error handling traversal — the core use case. </commentary> </example> <example> Context: The user has been hitting mysterious failures. user: "Something is silently failing but I can't figure out where" assistant: "I'll use the silent-failure-hunter to find all places where errors might be swallowed silently." <commentary> The agent excels at finding suppressed exceptions that cause mysterious downstream behavior. </commentary> </example>
Use this agent to catalog technical debt in a Python codebase — TODOs, FIXMEs, HACKs, deprecated API usage, pinned workarounds, and deferred decisions. It collects, categorizes, and ages debt items using git blame to distinguish fresh debt from ancient debt. Produces an actionable inventory for sprint planning or cleanup campaigns. <example> Context: The user wants to understand the accumulated tech debt. user: "How much tech debt do we have? Can you catalog it?" assistant: "I'll use the tech-debt-inventory to catalog all TODOs, FIXMEs, workarounds, and deprecated usage." </example> <example> Context: Planning a cleanup sprint. user: "I have time this week for cleanup — what tech debt should I tackle?" assistant: "I'll run the tech-debt-inventory to give you a prioritized list of debt items to address." </example>
Use this agent to analyze test coverage quality and completeness across an existing Python codebase. Unlike a PR test reviewer that checks whether new changes are tested, this agent correlates source modules with test files, identifies undertested modules, and assesses behavioral coverage without running tests. It works by structural analysis — mapping which source code has corresponding tests and which critical paths lack coverage. Best used after architecture-mapper. <example> Context: The user wants to know what's undertested in their codebase. user: "What parts of the codebase have the weakest test coverage?" assistant: "I'll use the test-coverage-analyzer to map source modules to test files and identify undertested areas." <commentary> The core use case: structural test coverage gap analysis. </commentary> </example> <example> Context: The user is planning what tests to write next. user: "I want to improve test coverage — where should I focus?" assistant: "I'll run the test-coverage-analyzer to rank modules by test coverage quality so you can prioritize." <commentary> The ranked output helps prioritize test-writing effort. </commentary> </example>
Use this agent to find bugs by treating tests as invariant specifications. It reads existing tests to extract what developers believe should be true, maps those beliefs to the code under test AND structurally similar code, then checks whether the invariants hold on untested paths and analogous functions. Unlike test-coverage-analyzer (which checks what IS tested), this agent uses tests as a SIGNAL for what SHOULD be true everywhere. <example> Context: The user wants to find bugs by leveraging existing tests. user: "Can you use our test suite to find bugs in untested code?" assistant: "I'll use the test-investigation-agent to extract invariants from your tests and check if they hold across similar code." <commentary> The core use case: tests as bug-finding signal. </commentary> </example> <example> Context: The user suspects inconsistencies between tested and untested behavior. user: "Our tests check error handling in some modules but not others — are we missing bugs?" assistant: "I'll use the test-investigation-agent to extract error handling invariants from tested modules and verify them against untested ones." <commentary> Invariant propagation across module boundaries. </commentary> </example>
Use this agent to analyze type design quality in a Python codebase — type hint coverage, dataclass/TypedDict/NamedTuple design, Protocol usage, Any overuse, and invariant enforcement. Adapted for Python's gradual typing system where not everything needs annotations and the trade-offs between typing approaches differ from statically-typed languages. <example> Context: The user wants to assess the type system quality. user: "How good is our type annotation coverage and design?" assistant: "I'll use the type-design-analyzer to evaluate type hint coverage, type design quality, and annotation consistency." </example> <example> Context: The user is considering adding stricter typing. user: "Should we run mypy strict on this project? How far off are we?" assistant: "I'll use the type-design-analyzer to assess current annotation coverage and identify the gaps." </example>
A Claude Code plugin that bundles 14 specialized agents and 4 commands for exploring and analyzing existing codebases. It answers the question: where are the problems in this codebase, and what should I fix first?
Add this repository as a Claude Code marketplace, then install the plugin:
# Add the marketplace (one-time setup)
claude plugin marketplace add devdanzin/code-review-toolkit
# Install the plugin
claude plugin install code-review-toolkit@code-review-toolkit
Or use the interactive plugin manager:
# Open the plugin manager
/plugin
# Go to the Discover tab, find code-review-toolkit, and install
Install the plugin directly without adding the marketplace:
claude plugin install code-review-toolkit --source github:devdanzin/code-review-toolkit --path plugins/code-review-toolkit
Clone the repo and launch Claude Code with --plugin-dir — the plugin is loaded for that session only, nothing is installed:
# Clone the repository
git clone https://github.com/devdanzin/code-review-toolkit.git
# Run Claude Code with the plugin loaded for this session
claude --plugin-dir code-review-toolkit/plugins/code-review-toolkit
After installation, these commands are immediately available in Claude Code:
/code-review-toolkit:map # Understand project structure
/code-review-toolkit:health # Quick health assessment
/code-review-toolkit:hotspots # Find cleanup targets
/code-review-toolkit:explore # Full exploration (all agents)
For your first time, start with map to understand the architecture, then health for a quick overview, then drill into specific areas with explore.
explore, map, hotspots, health) for different analysis workflows.For detailed usage, agent descriptions, and recommended workflows, see the plugin README.
MIT — see LICENSE for details.
Originally created by Daisy (Anthropic). Adapted by Daniel Diniz.
Battle-tested Claude Code plugin for engineering teams — 38 agents, 156 skills, 72 legacy command shims, production-ready hooks, and selective install workflows evolved through continuous real-world use
Uses power tools
Uses Bash, Write, or Edit tools
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive C4 architecture documentation workflow with bottom-up code analysis, component synthesis, container mapping, and context diagram generation
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, rules, and legacy command shims evolved over 10+ months of intensive daily use
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications