From develop
Test-first refactoring — audit coverage, add characterization tests, apply changes with safety net, run quality stack and review loop.
npx claudepluginhub borda/ai-rig --plugin developThis skill is limited to using the following tools:
<objective>
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Test-first refactoring. Audit test coverage, add characterization tests if missing, then apply changes with a safety net.
NOT for: bug fixes (use /develop:fix); new features (use /develop:feature); .claude/ config changes (use /manage).
Foundry plugin check: run
ls ~/.claude/plugins/cache/ 2>/dev/null | grep -q foundry(exit 0 = installed). If the check fails or you are uncertain, proceed as if foundry is available — it is the common case; only fall back if an agent dispatch explicitly fails.
When foundry is not installed, substitute foundry:X references with general-purpose and prepend the role description plus model: <model> to the spawn call:
| foundry agent | Fallback | Model | Role description prefix |
|---|---|---|---|
foundry:sw-engineer | general-purpose | opus | You are a senior Python software engineer. Write production-quality, type-safe code following SOLID principles. |
foundry:qa-specialist | general-purpose | opus | You are a QA specialist. Write deterministic, parametrized pytest tests covering edge cases and regressions. |
Skills with --team mode: team spawning with fallback agents still works but produces lower-quality output.
Task hygiene: Before creating tasks, call TaskList. For each found task:
completed if the work is clearly donedeleted if orphaned / no longer relevantin_progress only if genuinely continuingTask tracking: immediately after Step 1 (scope is known), create TaskCreate entries for all steps of this workflow before doing any other work. Mark each step in_progress when starting it, completed when done.
Read the target code and build a mental model before touching anything.
If <target> is a directory: use the Glob tool (pattern **/*.py, path <target>) to enumerate Python files.
# Measure current state
wc -l <target>/**/*.py 2>/dev/null || wc -l <target>
Structural context (codemap, if installed) — soft PATH check, silently skip if scan-query not found:
PROJ=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null) || PROJ=$(basename "$PWD")
if command -v scan-query >/dev/null 2>&1 && [ -f ".cache/scan/${PROJ}.json" ]; then
scan-query central --top 5
fi
If results are returned: prepend a ## Structural Context (codemap) block to the foundry:sw-engineer spawn prompt with the hotspot JSON. Additionally, if the target maps to a module in the index, also include scan-query deps <target_module> (what the target imports — coupling) and scan-query rdeps <target_module> (what imports the target — blast radius of API changes). Derive <target_module> from the target path: strip the project root prefix, replace / with ., drop the .py extension. If scan-query is not found or index is missing: proceed silently — do not mention codemap to the user.
Spawn a foundry:sw-engineer agent to analyze the code and identify:
Scope gate: if the target is directory-wide scope (10+ files) regardless of goal, flag the complexity smell. Use AskUserQuestion to present the scope concern before proceeding, with options: "Narrow scope (Recommended)" / "Proceed anyway".
Find existing tests for the target code:
Use the Glob tool (pattern **/test_*.py or **/*_test.py) to find candidates, then the Grep tool (pattern <module_name>, output mode files_with_matches) to narrow to those that reference the target.
# Check coverage
python -m pytest --co -q 2>/dev/null | grep -i "<module_name>" || echo "No tests found"
python -m pytest --cov= -q <target_module >--cov-report=term-missing 2>/dev/null
Classify each public function/method as:
Before writing characterization tests, critically evaluate the audit output itself:
If the audit seems incomplete: re-examine before proceeding to Step 3. Gaps in the safety net discovered mid-refactoring (Step 4) are costly.
For every uncovered or partially covered public API, spawn a foundry:qa-specialist agent to generate characterization tests:
pytest.mark.parametrize for multiple input/output pairstest_<function>_characterization_*# Run to confirm they pass against current code
python -m pytest <test_file >-v
Gate: all characterization tests must pass before proceeding. If any fail, fix the test, not the code.
For each change:
python -m pytest --tb=short <test_files >-v
Safety break: max 5 change-test cycles per session. After 5, stop and report which succeeded, which broke, and what remains.
Refactoring categories:
_-prefixed functions with no call sites; flag public methods absent from __init__.py exportsRead .claude/skills/_shared/codex-prepass.md and run the Codex pre-pass before cycle 1.
Full review of the refactored code. This is a loop — review -> targeted refactoring (return to Step 4) -> re-review until only nits remain. Maximum 3 outer cycles. (Step 4's "max 5 change-test cycles" bound applies within each individual pass through Step 4, independently of this outer loop.)
Each cycle:
Evaluate against all criteria:
For every gap found: return to Step 4 and apply a targeted fix — one focused change per gap.
Re-run the full test suite:
python -m pytest --tb=short <test_files >-v 2>&1 | tail -20
If only nits remain (variable naming, comment clarity, minor formatting): document in Follow-up and exit the loop.
If substantive gaps remain: start the next cycle (max 3 total).
After 3 cycles: if substantive issues remain, stop — surface them to the user before proceeding.
Read .claude/skills/_shared/quality-stack.md and execute the Branch Safety Guard, Quality Stack, Codex Pre-pass, Progressive Review Loop, and Codex Mechanical Delegation steps.
## Refactor Report: <target>
### Goal
[stated goal or "general quality pass"]
### Test Coverage Before
- Covered: N functions | Partially: N | Uncovered: N
- Characterization tests added: N
### Changes Made
| File | Change | Lines |
|------|--------|-------|
| path/to/file.py | extracted helper function | -12/+8 |
### Test Results
- All tests passing: yes/no
- Coverage: before% -> after%
### Follow-up
- [any remaining items that need manual review]
## Confidence
**Score**: 0.N — [high ≥0.9 | moderate 0.8–0.9 | low <0.8 ⚠]
**Gaps**: [e.g., coverage tool unavailable, some tests skipped]
**Refinements**: N passes.
When to use team mode: target is a directory OR cross-module scope.
Coordination:
{target: <path>, coverage: <summary>, goal: <stated goal>}Spawn prompt template:
You are a [foundry:sw-engineer|foundry:qa-specialist] teammate refactoring: [target].
Read ~/.claude/TEAM_PROTOCOL.md — use AgentSpeak v2. Apply file locking protocol for concurrent edits.
Your task: [refactoring steps 4 | characterization tests step 3].
Compact Instructions: preserve file paths, test results, coverage numbers. Discard verbose tool output.
Task tracking: do NOT call TaskCreate or TaskUpdate — the lead owns all task state. Signal your completion in your final delta message: "Status: complete | blocked — <reason>".