From vibesubin
Verifies behavior-preserving refactors (rename, split, merge, extract, inline, dead code delete) via dependency tree planning, symbol-set/AST diffs, full test suite, and call-site reference closure.
npx claudepluginhub subinium/vibesubin --plugin vibesubinThis skill is limited to using the following tools:
The operator asked for a change that's supposed to preserve behavior — a refactor, a rename, a split, an extract, a dead-code deletion. Your job is to *prove* that behavior was preserved, not just produce a diff that looks right.
Orchestrates safe refactoring ensuring tests pass before/after changes. Requires explicit approval for test modifications and generates characterization tests for low-coverage code.
Runs autonomous iterative refactoring to improve tactical code quality (DRY, dead code, naming, complexity) via SME agents, QA verification, atomic git commits, looping until no improvements remain.
Guides refactoring code structure and modules to improve maintainability, readability, and performance while preserving external behavior and ensuring all tests pass.
Share bugs, ideas, or general feedback.
The operator asked for a change that's supposed to preserve behavior — a refactor, a rename, a split, an extract, a dead-code deletion. Your job is to prove that behavior was preserved, not just produce a diff that looks right.
Behavior-preserving changes are the single biggest source of silent regressions when an LLM touches code. The classic failure is: the AI moves a function, updates the definition, and misses one of several call sites. The tests still pass because coverage was never complete. No one notices until a user hits the broken path.
This skill exists to stop that from happening. It covers two change families:
fight-repo-rot). The behavior is supposed to be identical because the code wasn't running.Both families use the same four verification checks.
A change is not done until all four of these pass:
If any of the four fails, fix it and re-run all four. Do not partially claim success.
Each step produces an artifact or an assertion. Never skip.
Record the starting state. You will diff against this at the end, and you will need it to roll back cleanly if something goes wrong.
Isolation — pick one of two options:
Preferred: git worktree. Create a separate working directory for the change. Nothing in the operator's main checkout is touched while you work. Rollback = git worktree remove.
SNAP=$(git rev-parse HEAD)
git worktree add ../verify-<topic> "$SNAP"
cd ../verify-<topic>
git switch -c verify/<topic>
Acceptable: temporary work branch on the main checkout. Only if worktree isn't available (e.g., the operator's filesystem doesn't support multiple worktrees, or the repo has submodules that don't co-exist).
SNAP=$(git rev-parse HEAD)
git switch -c verify/<topic>
Do not use git stash as an isolation mechanism. Stashes are interruption-unsafe — a crash, a power loss, or another git stash from a second session can lose the stash ref, and recovery is manual. A branch is cheap, always recoverable, and always preferable. This rule is non-negotiable.
Never work directly on the operator's main branch. Even a three-line change gets a branch.
Capture the baseline — language-agnostic:
SNAP=$(git rev-parse HEAD) — the commit you'll diff against and roll back toscripts/symbol-diff.sh)/tmp/verify-baseline-tests.txt/tmp/verify-baseline-build.txt/tmp/verify-baseline-lint.txtBootstrap the isolated workspace before behavioral checks. Fresh worktrees often do not contain project-local state like .venv/, node_modules/, vendor/, generated clients, or compiled assets. A "module not found" failure in a brand-new worktree is an environment problem until proven otherwise.
If the baseline is already red, stop. You cannot verify a change on top of broken code — every failure afterwards becomes ambiguous (did I cause it or was it already broken?). Ask the operator whether to fix the existing failures first. Do not proceed.
Rollback plan (write it down before starting):
# If anything goes wrong, this restores the operator's world exactly:
git switch main # leave the verify branch
git worktree remove ../verify-<topic> # if you used a worktree
# or: git branch -D verify/<topic> # if you used a branch
git reset --hard "$SNAP" # only if you somehow touched main
Keep $SNAP visible throughout the session. It is the single most important value.
Do not start editing. Write the change down as a tree where each node is a small step, and its children are the prerequisite steps that must happen first. Execute leaves before parents. The principle is old — Mikado Method is one well-known formalization — but the shape is what matters: commit the dependency graph to a scratch file before touching any code, so when something fails you can tell whether the plan was wrong or the execution was wrong.
Example tree for "split big_module into big_module_core and big_module_helpers":
Split big_module
├── Create big_module_helpers.* with helper functions
│ └── Identify which functions are pure helpers (no cross-refs into big_module)
├── Create big_module_core.* with the rest
│ └── Update imports inside the moved bodies
├── Replace `import big_module` with `import big_module_core` (plus helpers) everywhere
│ └── Inventory all current import sites
└── Delete the original big_module
└── Confirm no remaining references
Example tree for "delete confirmed-dead legacy_reports module":
Delete legacy_reports
├── Verify zero external references (grep + LSP find-references)
├── Delete legacy_reports/ directory
├── Remove any import of legacy_reports from __init__.py / index.ts
└── Confirm symbol-set diff shows only the expected removals
Each node in the tree has three fields:
src/*)Write the tree as a scratch file (e.g., .verify-plan.md, gitignored). Show it to the operator before proceeding if the change touches more than 3 files. Let them approve or adjust.
Detailed pattern, including when to linearize vs branch the tree and how to handle cross-cutting constraints: references/verification-procedure.md.
Follow Tidy First: a commit is either structural (move, rename, extract, inline) or behavioral (logic change). Never mixed. This makes every commit independently verifiable.
For each leaf:
write-for-ai skill for the format)If a leaf fails, do not move up. Fix the leaf or revise the tree. A failed leaf that you skip becomes a load-bearing bug later.
For every function, class, or constant that got moved in this step, prove its body survived intact.
Extract the symbol from the old location (using the snapshot from step 1 — or git show HEAD~1:old_path). Extract it from the new location. Normalize both (strip whitespace, strip comments, strip trailing commas). Compare.
If the language has AST tooling:
ast-grep --lang <lang> --pattern '<pattern>' <path> — find the symboltree-sitter parse <path> — get the ASTIf it doesn't, fall back to a normalized text diff:
# pseudocode
old_body=$(extract_symbol_from_git_show HEAD~1:old_path symbol_name | normalize)
new_body=$(extract_symbol_from_file new_path symbol_name | normalize)
diff <(echo "$old_body") <(echo "$new_body")
The goal is a zero-byte diff on the normalized form. If there's a diff, you didn't just move the symbol — you also mutated it. That's a behavioral change hiding in a structural commit, which violates Tidy First.
Detail: see the "Step 4 — Per-node AST (or byte) diff" section of references/verification-procedure.md.
Run the full smoke-test chain for the language. Never skip because "it's just a rename."
Order matters — fail fast on the cheapest check first:
python -c 'import x', node -e "require('./')", cargo check, go build ./...)mypy, tsc --noEmit, cargo check, go vet)ruff, eslint, clippy, golangci-lint)pytest, jest, cargo test, go test ./...)If any step fails, return to step 4 — something you thought was structural is actually behavioral.
Per-language details: references/language-smoke-tests.md
This is the step that catches the #1 LLM failure mode in refactors and deletions: a rename that updated the definition but missed half the callers, or a "dead code" deletion where one caller turns out to be live.
Before the change (from step 1 snapshot), count every reference to each affected symbol. Prefer an LSP find-references query — gopls, rust-analyzer, pyright, typescript-language-server — because grep is a lower bound, not a ground truth.
# LSP preferred (accurate for cross-file symbols, handles dynamic dispatch)
# — run your editor's find-references or the CLI equivalent
# grep fallback — works for every language but is a lower bound
git grep -n 'old_symbol_name' | wc -l # from the snapshot state
After the change, count references to the new name — plus any remaining references to the old name that shouldn't exist:
git grep -n 'new_symbol_name' | wc -l # should equal the old count (renames) or zero (deletions of confirmed-dead code)
git grep -n 'old_symbol_name' | wc -l # should equal 0 (or the number of deliberately-kept aliases)
If the counts don't match, you missed a callsite or your "dead" symbol wasn't actually dead. Go find the reference. Do not report done.
Treat renames and deletions as two-assertion checks, not one:
| Operation | Assertion 1 | Assertion 2 |
|---|---|---|
| Rename | old-count before == new-count after | old-count after == 0 |
| Delete (dead code) | old-count before == 0 (confirmed before deleting) | old-count after == 0 |
| Move | old-path-count after == 0 | new-path-count after == old-path-count before |
The helper supports this directly:
scripts/callsite-count.sh "$SNAP" HEAD old_name new_name
Grep is a lower bound. A match count of zero in grep does not prove the symbol is unreferenced — dynamic dispatch, reflection, string-built symbol names, and cross-language boundaries all evade grep. When deleting code based on a grep-zero result, label the confidence as MEDIUM and require a second signal (LSP find-references, import graph analysis, or an explicit operator confirmation).
Report the change complete only when:
Report template (use in the PR description or session output):
## refactor-verify report
Change type: <refactor | rename | split | move | extract | inline | delete-dead>
Plan: <1-line summary of the dependency tree>
### Symbol-set diff
- Before: <N> exported symbols across <M> files
- After: <N> exported symbols across <M'> files
- Diff: <explicit delta, or "none">
### AST body preservation
- <N> moved symbols, all normalized-equivalent to source
### Behavioral
- Typecheck: ✅
- Lint: ✅
- Tests: ✅ (X/Y passing, unchanged from baseline)
- Smoke: ✅ (imports + local run)
### Call-site closure
- Old references: <N> (from snapshot)
- New references: <N> (equal, verified)
- Orphans: 0
Verification commands (re-run anytime):
<concrete command 1>
<concrete command 2>
Documented failure patterns. If any of these trip during execution, the verification system should catch them — but it's faster to avoid them upfront.
from foo.bar import baz but foo.bar doesn't have baz. Guard: step 5's import smoke test.Full catalog with examples and detection notes: references/llm-failure-modes.md.
If the change includes a README or architecture doc rewrite, apply the same discipline:
See "Information preservation during doc rewrites" and the general preservation pattern in references/verification-procedure.md.
fight-repo-rot first to confirm the target is dead with explicit confidence, then come back here for the delete + verifyYou cannot verify a change on top of broken tests. If step 1 discovers failing tests or typecheck errors:
A change on top of red is indistinguishable from a change that caused red. You lose the ability to prove causation.
When invoked from /vibesubin (the umbrella skill's parallel sweep), this skill runs in read-only audit mode. Do not snapshot, do not plan a dependency tree, do not execute any change, do not run the 6-step procedure.
Instead, produce a findings-only report:
/refactor-verify will plan and execute any specific change when invoked directly.The operator reviews the sweep report and, if they want a specific refactor applied, invokes /refactor-verify directly — which then runs the full 6-step procedure.
How to tell: the task context from the umbrella will include a sweep=read-only marker or an explicit "produce findings only, do not edit" instruction. Obey it. If the operator invokes this skill by name, the full procedure applies and editing is expected.
When the task context contains the tone=harsh marker (usually set by the /vibesubin harsh umbrella invocation, but can also come from direct requests like "don't sugarcoat" / "brutal review" / "매운 맛" / "厳しめ"), switch output rules on both the sweep audit and the direct-invocation report:
src/api/user.ts::getUser is referenced 12 times but only 11 were updated. The one missed call site is tests/api/user.test.ts:47." Not "some call sites may still reference the old name."Harsh mode does not invent failures, skip verification steps, or become rude. Every harsh statement must cite the same symbol count, grep output, or test result the balanced version would cite. The change is framing, not substance.
fight-repo-rot — dead-code findings (with confidence) come in here as delete-dead jobs. The handoff includes the list of symbols and the confidence level; LOW-confidence deletions require an explicit operator OK.write-for-ai — hand off the commit and PR writing; write-for-ai knows how to document verification results.manage-secrets-env (if the change touches secrets, .env, or .gitignore) or project-conventions (if the change is about branch strategy, directory layout, dep pinning, or path portability) on where new values should live before restructuring.audit-security on the result.The 6-step procedure above is language-agnostic. The specific commands for each language (how to typecheck, how to run tests, how to find references) live in references:
references/language-smoke-tests.md — per-language command chainsreferences/verification-procedure.md — deep-dive into each of the six steps, including dependency-tree planning, AST diffing, and information preservation for doc rewritesreferences/llm-failure-modes.md — the catalog of mistakes to guard against, with the specific step that catches each oneScripts (invoked, not read):
scripts/symbol-diff.sh — print symbol-set diff between two git refsscripts/smoke-test.sh — auto-detect language, prefer project-local toolchains when present, and warn when an isolated worktree is not bootstrapped enough to trust failuresscripts/callsite-count.sh — count references before/after, with explicit rename support and stale old-name detection