From claudekit
Enforces evidence-based verification before claiming tasks, features, or PRs complete. Requires pasting test outputs, command runs, and behavioral checks; rejects vague assertions.
npx claudepluginhub duthaho/claudekit --plugin claudekitThis skill uses the workspace's default tool permissions.
A pre-completion gate that converts the assertion "this is done" into evidence:
Enforces evidence-based verification by running fresh tests, builds, linters, reviewing outputs before claiming work done, committing, or PRing.
Enforces running verification commands like tests, lints, builds before claiming code complete, fixed, or passing, prior to commits/PRs.
Use when about to declare work done, fixed, or passing, before committing or opening PRs - demands executing verification commands and reading their output before making any success assertions; evidence precedes claims always
Share bugs, ideas, or general feedback.
A pre-completion gate that converts the assertion "this is done" into evidence: the test output, the command run, the behavioral observation. The skill exists because "done" is a self-reported state and self-reports are wrong roughly half the time on real changes. The gate is short — five minutes — but it catches the class of mistake where an engineer claims a fix works after only running tests in their IDE, only verifying happy-path, only checking a single environment. This is the load-bearing skill of v4: every other skill funnels through it before its work is called done.
Acceptance: line in a planGoal: Make the "done" claim explicit and falsifiable.
Inputs: Whatever you were about to call done.
Actions:
I am claiming <X> is complete because <Y>. X is the
work; Y is the evidence you intend to show. Don't do anything else until
you've written it.Output: A claim sentence written down (PR description, scratch file, or comment).
Goal: Prove the tests asserting this work pass, with evidence pasted.
Inputs: The list of tests relevant to this work.
Actions:
Acceptance: line if it exists.Output: Test runner output captured.
Goal: Verify the work doesn't claim to handle cases it doesn't.
Inputs: The acceptance criteria.
Actions:
Output: Negative-path observations: each case + what happened + verdict.
Goal: Confirm the work runs outside your editor.
Inputs: A way to run the change in a more production-like context.
Actions:
curl or your project's HTTP test tool.
Confirm response shape, status codes, headers.Output: A non-IDE verification: the command run, the result observed.
Goal: Confirm the work satisfies what was actually asked, not what got implemented.
Inputs: The original ticket, plan task, or spec criterion.
Actions:
Output: A short matrix: <asked for> → <addressed in <location> | deferred to <follow-up>>.
Goal: Record that the gate ran.
Inputs: All evidence from Steps 2-5.
Actions:
## Verification section to the PR description (or a scratch artifact
if no PR yet) containing:
Output: A ## Verification section in the PR.
| Excuse | Why it sounds reasonable | Why it's wrong | What to do instead |
|---|---|---|---|
| "Tests pass — that's enough verification." | A green test suite is the standard signal, automated and trusted. | Tests cover what was tested. The verification gate exists because the cases that hit production are typically the cases tests didn't cover — the negative paths, the production environment quirks, the bits the implementer assumed worked. "Tests pass" is necessary evidence, not sufficient evidence. | Run the tests AND run the negative path AND verify in a non-IDE environment AND cross-check the original ask. The first one alone is what produces the "tests passed but it broke in prod" pattern. |
| "I tested it manually in my IDE — it works." | Manual verification does count as evidence for the cases that have no automated test. | "It works in my IDE" passes Step 4 only for the IDE environment. Production doesn't run in your IDE. The IDE has your env vars, your local DB, your hot-reloaded modules, your debugger attached. The non-IDE run catches the cases where production doesn't have those. | Run the change outside the IDE. A 30-second curl from a separate shell catches "I forgot to deploy the migration" and "the env var is only set in my .env" — bugs that aren't catchable from inside the IDE. |
| "The negative path is obvious — invalid input throws an error, that's it." | Many systems do throw on invalid input by default; the language/framework provides this for free. | "Throws an error" doesn't tell the user/caller what went wrong or what to do. The default error message is often Internal Server Error 500 — useful to no one. The negative path verification isn't asserting that errors happen; it's asserting that the error is useful to the consumer. | Exercise the negative case. Read the error the user/caller would actually see. If it's "Internal Server Error" or "undefined is not a function," that's a finding even if the test passes. |
| "I'll do the cross-check in code review — that's what review is for." | Reviewers do verify the work matches the ask. | The reviewer cross-checks against what they remember the ask was, not what was originally written. They don't have time to re-read the original ticket and match it line-by-line; they read the PR and approve based on what looks right. The cross-check belongs in the gate so the reviewer can verify the gate ran, not redo the work. | Step 5 takes 60 seconds. Re-read the ticket, list what was asked, point at where each ask was addressed. The matrix saves the reviewer 5 minutes of context-rebuilding and produces a better review. |
| "I don't need to paste the test output — the CI will run it." | CI is the system of record for test results. | CI runs after the PR is open. The verification gate runs before the PR is open. Skipping the local paste means the PR opens with no evidence, the reviewer waits for CI, and if CI fails the round trip is on the reviewer's calendar instead of yours. The paste is also documentation: in two months, when someone bisects, the PR has the receipt of what was tested. | Paste the output. If the CI later runs the same tests and matches your local output, that's confirmation; if it diverges, that's an environment bug worth knowing about. |
| "It's a small change — the gate is overkill." | Most small changes don't break things; the overhead per small change feels disproportionate. | The gate's overhead is ~5 minutes; the cost of skipping the gate on a "small" change that turned out not to be small is ~hours plus a Slack message of apology. The cases that feel small enough to skip the gate include the ones where the small-feeling change had a non-small consequence. | Run the gate. For genuinely tiny changes (typo, comment, single-line config) you can collapse Steps 2-5 into one paste — but don't skip the gate. The discipline is uniform; the per-change cost stays low. |
| Checkpoint | Required artifact | What "no evidence" looks like |
|---|---|---|
| End of Step 1 | A claim sentence written in <X> is complete because <Y> form | "I'm done with this." |
| End of Step 2 | Test runner output pasted | "Tests pass." |
| End of Step 3 | Negative-path observations: case + observed behavior | "Error handling looks fine." |
| End of Step 4 | Non-IDE run output (curl, browser screenshot, CLI output) | "Works on my machine." |
| End of Step 5 | Cross-check matrix linking asks to addressed locations | "I implemented what was asked." |
| End of Step 6 | A ## Verification section in the PR with all of the above | "Marked the task done in the tracker." |
<Y> is "the code looks right." The claim has no
evidence; the work isn't done.