From ct
Detects code duplicates using jscpd token-level analysis across files, classifies exact/near/structural clones, ranks by impact score (lines x instances), prepares prioritized refactoring plans.
npx claudepluginhub pvillega/claude-templates --plugin ctThis skill uses the workspace's default tool permissions.
jscpd performs token-level comparison across every file pair, catching duplicates invisible to manual review. This skill runs that analysis and turns output into a classified, prioritized refactoring plan.
Detects code duplication and clones using QuickDup CLI scanner. Identifies DRY violations and copy-pasted code by file extension to reduce codebase size and clean up redundancy.
Detects and refactors code duplication using PMD CPD for exact duplicates and Semgrep for patterns. Useful for code clones, DRY violations, and copy-paste code.
Audits JS/TS codebases for semantic function duplicates via extraction scripts and LLM intent clustering. Useful for LLM-generated code with reimplemented utilities.
Share bugs, ideas, or general feedback.
jscpd performs token-level comparison across every file pair, catching duplicates invisible to manual review. This skill runs that analysis and turns output into a classified, prioritized refactoring plan.
REQUIRED FOLLOW-UP SKILL: Use incremental-refactoring to implement the refactoring after detection.
If the user says "refactor technical debt" or "clean up this codebase", start here to find targets first.
digraph detection_flow {
"Check project config" -> "jscpd available?" [shape=diamond];
"jscpd available?" -> "Run jscpd" [label="yes"];
"jscpd available?" -> "Grep fallback" [label="no"];
"Run jscpd" -> "Extract & cap results (jq)";
"Grep fallback" -> "Extract & cap results (jq)";
"Extract & cap results (jq)" -> "Classify (exact/near/structural)";
"Classify (exact/near/structural)" -> "Rank by impact score";
"Rank by impact score" -> "Present findings → incremental-refactoring";
}
| Clone Type | Description | Refactoring Pattern |
|---|---|---|
| Exact | Byte-for-byte identical | Extract Function (no params) |
| Near | Differs in names/literals/minor expressions | Parameterize (extract with args for varying parts) |
| Structural | Same algorithm pattern, different implementations | Template Method / Strategy |
Ranking: impact_score = duplicated_lines x instances. Tiebreaker: exact > near > structural.
ls .jscpd.json .jscpdrc .jscpdrc.json 2>/dev/null # Check project config
which jscpd || npm install -g jscpd # Check/install jscpd
If a config file exists, respect it — run jscpd with no overriding flags so the project's thresholds and ignore patterns take effect.
If jscpd/npm unavailable, use the Grep Fallback section below. Tell the user: grep finds exact duplicates only.
With project config (minimal flags, let config drive):
jscpd --reporters json --output /tmp/jscpd-report --gitignore /path/to/code
Without project config (sensible defaults):
jscpd --min-lines 10 --min-tokens 50 \
--ignore "node_modules,dist,build,vendor,.git,__pycache__,*.min.js,coverage,tmp,generated" \
--reporters json --output /tmp/jscpd-report --gitignore /path/to/code
Always use: --gitignore (respect .gitignore), --output /tmp/jscpd-report (avoid flooding stdout), --reporters json (not console for large codebases).
Tuning: --min-lines 5 (small) to 15-20 (large/verbose). --format "javascript,typescript" to scope languages.
Token budget management — use jq to avoid loading raw JSON into context:
# Summary metrics
jq '{percentage: .statistics.total.percentage, duplicatedLines: .statistics.total.duplicatedLines, clones: (.clones | length)}' /tmp/jscpd-report/jscpd-report.json
# All groups above threshold, sorted by impact, fragments capped at 500 chars
# Default: include groups with impact >= 20 (e.g., 10 lines x 2 instances)
# Adjust threshold based on codebase size — lower for small, higher for large
jq '[.clones | group_by(.fragment) | map({fragment: .[0].fragment[0:500], lines: (.[0].duplicationA.end.line - .[0].duplicationA.start.line), instances: length, impact: ((.[0].duplicationA.end.line - .[0].duplicationA.start.line) * length), files: [.[] | .duplicationA.sourceId, .duplicationB.sourceId] | unique}) | sort_by(-.impact) | map(select(.impact >= 20))]' /tmp/jscpd-report/jscpd-report.json
Filtering: Include all groups above an impact threshold (default: impact >= 20) rather than a hard top-N cap. This surfaces all meaningful duplicates in one pass. Fragments truncated to 500 chars. If the result set is still too large for context, raise the threshold or batch into pages of 10. Present summary metrics first.
Inline exclusions: If expected duplicates are missing, check for jscpd:ignore-start / jscpd:ignore-end markers. Mention these to the user.
Dispatch subagents in parallel for all groups above the impact threshold:
Analyze this duplicate group:
- Source A: <file_a> lines <X-Y>
- Source B: <file_b> lines <M-N>
1. Classify: Exact / Near / Structural (see Quick Reference)
2. Match refactoring pattern to classification
3. Note differences between instances
4. Estimate impact: lines saved, files touched
5. Flag risks: same-looking code with different side effects, or coincidental structural similarity (should NOT be unified)
For each priority item:
## Priority 1: [Name] (X lines, Y instances)
**Type:** Exact / Near / Structural
**Pattern:** Extract Function / Parameterize / Template Method
**Impact score:** X | **Files:** list
1. Write test capturing current behavior of one instance
2. Extract shared code with parameters for variation points
3. Replace each instance, run tests after each
4. Remove dead code
## Duplicate Code Analysis
**Metrics:** X% duplication (Y lines, Z clone groups)
**Config:** [project .jscpd.json / defaults] | **Scanned:** [path] (respecting .gitignore)
**Top priorities:**
1. [Name] — Exact, X lines, Y instances (impact: Z) → Extract Function
2. [Name] — Near, X lines, Y instances (impact: Z) → Parameterize
Want to start refactoring Priority 1? (I'll hand off to incremental-refactoring)
Tell the user upfront: grep finds exact text duplicates only — no near or structural detection.
Use ripgrep (rg) if available (respects .gitignore by default, handles nested .gitignore files). Use --type to scope languages (e.g., --type py --type go).
With ripgrep:
# Find most-repeated non-trivial lines, with file locations
rg -n --no-heading --type py --type go '.' /path/to/code \
| awk -F: '{line=$3; for(i=4;i<=NF;i++) line=line":"$i; if(length(line)>60) print line}' \
| sort | uniq -c | sort -rn | head -20
# Then locate which files contain the top hit:
rg -n --no-heading 'exact duplicate line text here' /path/to/code
Without ripgrep (grep -rn fallback):
grep -rn --include='*.py' --include='*.go' \
--exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=dist --exclude-dir=coverage --exclude-dir=.git \
'.' /path/to/code \
| awk -F: '{line=$3; for(i=4;i<=NF;i++) line=line":"$i; if(length(line)>60) print line}' \
| sort | uniq -c | sort -rn | head -20
Processing results:
**Detection:** grep fallback (exact only) | **Recommendation:** Install jscpd for full detection
| Mistake | Fix |
|---|---|
Overriding project .jscpd.json with hardcoded flags | Check for config first, use minimal flags if present |
| Loading full jscpd JSON into context | Use jq to filter by impact threshold and truncate fragments |
| Hard-capping to top N groups | Use impact threshold instead — surfaces all meaningful duplicates in one pass |
Skipping --gitignore flag | Always pass it — without it, generated/coverage dirs get scanned |
| Treating all duplicates the same | Classify (exact/near/structural) — each needs different refactoring |
| Unifying coincidentally similar code | Structural clones with different domains should often stay separate |
.gitignore respected via --gitignorejscpd:ignore markers if relevant