This skill should be used when completing tasks, "marking done", "claiming success", or before reporting any work as finished. Provides evidence-based completion discipline with test output, build logs, and manual testing patterns. Do not use for planning tests (see testing-strategy) or the TDD cycle (see tdd-discipline).
From forgenpx claudepluginhub flox/forge-plugin --plugin forgeThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
Evidence-based completion discipline adapted for Forge workflows. Prevents false "done" claims by requiring concrete verification before marking tasks complete or claiming success.
"Done" means "verified", not "attempted".
Before claiming a task is complete, provide evidence that it actually works. This prevents:
Acceptable evidence:
Not acceptable:
| Scenario | Verification Type |
|---|---|
| Code changes | Run tests, show output |
| Build changes | Build succeeds, show logs |
| Config changes | Apply config, show result |
| Documentation | Render docs, check formatting |
| Bug fixes | Reproduce bug, show fix works |
| Refactors | Tests still pass (no behavior change) |
Implement the feature, fix the bug, or make the change.
For Python projects:
# Run tests
pytest path/to/test_file.py
# Expected output to capture:
# ====== X passed in Y.YYs ======
For JavaScript/TypeScript projects:
# Run tests
npm test
# Expected output to capture:
# Tests: N passed, N total
For any project:
# Check the project's CLAUDE.md for the test command
# Common patterns:
just test
just {project} test
npm test
pytest
In commit messages:
feat(manifest): Add version 2.0 validation
Implementation: Added validate_manifest function
Testing: pytest shows all tests pass
Test output:
running 12 tests
...
====== 12 passed in 0.45s ======
In PR descriptions:
## Verification
### Unit Tests
$ pytest
====== 15 passed in 1.23s ======
### Manual Testing
Tested with:
- Valid input → Success
- Invalid input → Clean error message
- Missing field → Helpful error
All scenarios work as expected.
Only after capturing verification evidence should you:
When using TDD (see tdd-discipline skill):
Already have verification evidence:
Just include test output in commit/PR.
When integrating multiple components:
Verify each integration point:
When fixing a bug:
Example:
## Bug Reproduction (BEFORE)
$ run-command
Error: KeyError: 'missing_field'
## After Fix
$ run-command
Success: processed 3 items
When refactoring without behavior change:
Run full test suite before and after:
## Refactor Verification
Before refactor: pytest → 45 passed
After refactor: pytest → 45 passed
No behavior change; tests confirm identical behavior.
Every code change should have unit test coverage.
Verification: Unit tests pass
$ pytest tests/unit/test_module.py
====== 8 passed in 0.12s ======
For changes affecting multiple components or contracts.
Verification: Integration tests pass
$ pytest tests/integration/
====== 12 passed in 3.45s ======
For user-facing features or critical paths.
Verification: Manual testing
Scenarios tested:
- Primary user flow → ✅ works
- Error path → ✅ handled gracefully
- Edge case → ✅ handled correctly
Very rare cases where verification is impractical:
Typo fixes in documentation
Comment changes only
Purely cosmetic changes
Note: If in doubt, verify. Skipping verification is the exception, not the rule.
❌ Bad:
"Implementation complete! Tests should pass."
(No evidence provided, reviewer has to test)
✅ Good:
"Implementation complete. Verification:
$ pytest
====== 15 passed in 0.45s ======
All tests pass as expected."
❌ Bad:
"Changed line 42 from X to Y. This should fix the bug
because [long explanation of why it should work]."
(No actual verification that it works)
✅ Good:
"Changed line 42 from X to Y.
Verification:
$ pytest tests/test_bug_scenario.py
====== 1 passed in 0.12s ======
Bug no longer reproduces."
❌ Bad:
"Ran one test manually, looks good."
(Only tested happy path, didn't run full suite)
✅ Good:
"Ran full test suite:
$ pytest
====== 45 passed in 2.1s ======
Also manually tested error paths:
- Invalid input → clean error message ✅
- Missing field → helpful error ✅"
implementation-worker AgentBefore marking task complete:
Example task update:
Status: ✅ Complete
Verification:
- Unit tests: pytest → 12 passed
- Integration: Tested full flow → success
- Manual: Verified feature works end-to-end
Ready for review.
code-reviewer AgentWhen reviewing PRs:
Check for verification evidence:
Red flags:
Request verification if missing:
Requesting verification evidence:
- Please run tests and include output
- Manually test the primary flow and capture results
- Ensure CI passes before merging
feat(feature-name): Brief description
Implementation: {what was implemented}
Testing: {tests run}
Output: {relevant test output or success message}
Result: All tests pass, feature verified working.
## Verification
### Automated Tests
$ pytest
====== 15 passed in 0.45s ======
### Manual Testing
- Scenario 1: {description} → ✅ Success
- Scenario 2: {description} → ✅ Success
- Edge case: {description} → ✅ Handled
All verification passed. Ready for review.
| Principle | Meaning |
|---|---|
| Done = Verified | Don't claim complete without evidence |
| Run tests, read output | Actually check that tests pass |
| Manual test critical paths | Automated + human verification |
| Capture evidence | Include test output in commits/PRs |
| No rationalization | Evidence beats explanation |
Key question before marking complete:
"If I showed this PR to a skeptical reviewer, could they see concrete evidence that it works?"
If the answer is no, you're not done yet.