From claudekit
Guides incremental shipping of large features, migrations, and refactors via vertical slices behind feature flags and refactor-with-evidence using test/perf deltas.
npx claudepluginhub duthaho/claudekit --plugin claudekitThis skill uses the workspace's default tool permissions.
A workflow for landing large changes as small, reversible increments. The skill
Guides incremental implementation of multi-file features, refactors, or large tasks via thin vertical slices. Cycle: implement, test, verify, commit, repeat for working states.
Drives AI coding agents through gated Spec → Plan → Build → Test → Review → Ship workflow for non-trivial features, refactors, or multi-file projects.
Plans and executes phased migrations for framework upgrades, API bumps, dependency major versions, and deprecations with compatibility verification.
Share bugs, ideas, or general feedback.
A workflow for landing large changes as small, reversible increments. The skill
exists because the most common shipping failure isn't a missing test or a bad
deploy — it's a 1500-line PR that bundles a feature, a refactor, and a config
change, takes three days to review, and lands with a regression nobody isolated.
Incremental shipping splits that into thin vertical slices behind feature flags,
plus a refactor-with-evidence section for behavior-preserving changes that need
their own discipline (test deltas, perf measurements). Used after write-plan
and test-first, before code-review-loop.
git revert is enough)Goal: Define the smallest change that delivers user-observable value (or preserves behavior, for refactors) and can ship on its own.
Inputs: A task or set of tasks from your plan.
Actions:
Output: A slice definition: Slice 1: <what's included>; out of slice: <what's deferred>.
Goal: A kill switch that lets the slice ship dark.
Inputs: The slice definition.
Actions:
<feature>_enabled for booleans,
<feature>_rollout for percentage rollouts.// Remove this flag and the off branch after rollout completes — see ticket <link>.Output: Flag is committed (off-by-default), readable from production.
Goal: Code that delivers the slice, gated by the flag.
Inputs: The slice definition + the flag.
Actions:
test-first. Each test runs both flag-on and flag-off
paths if behavior diverges.Output: Slice implementation behind the flag, all tests pass.
Goal: Structural changes that preserve behavior, proved by deltas.
Inputs: A refactor opportunity revealed during Step 3 OR a separate refactor task in the plan.
Actions:
Output: Refactored code + before/after evidence in the PR.
Goal: Land the slice in production with the flag off, then turn it on.
Inputs: Slice implementation + tests.
Actions:
Output: Slice fully rolled out OR rolled back via flag with a learning.
Goal: Close the loop on this slice.
Inputs: A 100% rollout that's been stable for the project's bake-time (typically 1 release cycle).
Actions:
Output: Either a new slice in flight or a flag-removal PR or a learning note.
| Excuse | Why it sounds reasonable | Why it's wrong | What to do instead |
|---|---|---|---|
| "Feature flags add complexity — let's just ship it." | Flags do add code paths and require maintenance. | "Just ship it" without a flag is fine for trivial changes; for the cases this skill applies to, the flag is the difference between a 30-second rollback and a 2-hour incident. The complexity of one well-placed flag is fixed and small; the complexity of fixing prod with no kill switch is unbounded. | Add the flag. The cost of one branch and one config read is the cheapest insurance you'll buy. Delete the flag after rollout (Step 6) so the complexity is temporary. |
| "I'll bundle this small cleanup with the feature — saves a PR." | Reducing PR count feels efficient. | The bundled cleanup is the change that breaks the PR review. The reviewer can't tell which lines are feature and which are cleanup; they ask questions about both, you answer for both, the review takes 2x as long. If the cleanup introduces a regression, bisect points to a commit that mixes feature and cleanup, doubling the debugging time. | Open a separate PR for the cleanup. The two PRs together review faster than one mixed PR. The reviewer can approve the cleanup with a glance and focus attention on the feature. |
| "Refactor first, then add the feature." | Clean code makes adding features easier. | Refactor-then-feature lands a refactor with no feature-driven verification. The "behavior-preserving" claim is unverified at the only test that matters — the feature exercising the refactored area. The refactor ships, looks fine, and the feature later reveals that the refactor changed behavior in a path tests didn't cover. | Make the change you need (the feature), then refactor afterward if needed, with the feature's tests as your safety net. Or: refactor and pass Step 4's evidence check (before/after deltas) explicitly. Don't refactor without evidence. |
| "I'll roll out to 100% directly — no point in 1%." | Gradual rollout has overhead and most slices are fine at 100%. | The cost of "no point in 1%" is a 100% rollout when the slice happens to have a regression. The 1% step would have surfaced the issue with 1% of the blast radius. Skipping the gradual ramp on the 95% of safe changes is fine; the discipline is needed for the 5% where it's not. | Default to a gradual ramp. If the change is small enough that 100% is genuinely safe, you can shorten the ramp (1% for 5 minutes, then 100%) but don't skip the verification step. |
| "I'll keep the off branch in code as a fallback even after rollout." | Fallback paths feel like safety. | Long-lived dual-path code becomes the ambiguity nobody understands six months later. The off branch is dead in production but alive in tests, in code review, in mental load. Every modification has to consider both paths. The "safety" you preserved is paid for forever. | Set a deletion deadline at the flag's introduction (Step 2 comment). When 100% rollout has baked, delete the flag and the off branch. If the change ever needs to be undone, git revert does the work — that's why version control exists. |
| "The refactor's behavior preservation is obvious — no need for the perf benchmark." | Many refactors really don't change perf. | "Obvious" without measurement is the line said before someone discovers the refactor changed an O(n) loop into an O(n²) one because of a hidden re-evaluation. Perf regressions from refactors are surprisingly common because the refactor optimized for readability, not for the compiler's hot path. | If the code is in a perf-sensitive area (request handler, hot loop, batch job), run the benchmark before and after. The delta is the receipt. If it's truly cold path, you can skip — but say so explicitly in the PR ("perf not measured; cold path"). |
| Checkpoint | Required artifact | What "no evidence" looks like |
|---|---|---|
| End of Step 1 | A slice definition naming what's included and what's deferred | "I'll start coding and see how big it gets." |
| End of Step 2 | A feature flag committed off-by-default with a deletion-plan comment | "We can add the flag later if needed." |
| End of Step 3 | Tests pass; flag-on and flag-off paths both exercised by tests | "It works behind the flag." |
| End of Step 4 (refactor) | Before/after test runner output + (if applicable) perf benchmark numbers | "Refactor preserves behavior — trust me." |
| End of Step 5 | Rollout sequence with monitoring observations at each ramp step | "It's at 100%, looks fine." |
| End of Step 6 | Either a flag-removal PR or a written learning from a revert | "We'll get to flag cleanup eventually." |