Evidence-Based Verification
This skill should be used during Phase 4.5 (Verification) when the user requests "verify this", "check if it works", "run tests", "prove it works", or after implementation completion. Enforces evidence-based validation - no "should work" claims allowed, requires actual test execution with output shown and proof of correctness.
From myceliumnpx claudepluginhub jason-hchsieh/marketplace --plugin myceliumThis skill uses the workspace's default tool permissions.
Evidence-Based Verification
Core Principle
Claims of success MUST be backed by actual evidence. "Should work" is prohibited.
This skill enforces rigorous verification practices because AI-generated code often appears correct but fails on edge cases, integration points, or real-world data. Every claim must be proven with executed commands and observed results.
When to Use
Apply during Phase 4.5 (Verification) when:
- Implementation is marked complete
- Tests "should" be passing
- Feature appears to work
- Code changes are ready for review
- Merge/deployment is being considered
The Iron Law of Verification
Never claim success without running the actual verification command.
Every verification claim requires:
- Identify the exact proof command
- Run it completely (not partially)
- Read full output + exit codes
- Verify output supports the claim
- Only then state the result
Prohibited Language
These phrases are BANNED in verification:
- ❌ "should work"
- ❌ "seems fine"
- ❌ "appears correct"
- ❌ "looks good"
- ❌ "probably passes"
- ❌ "likely works"
- ❌ "ought to pass"
Replace with evidence-based language:
- ✅ "Tests pass (exit code 0, 42/42 tests green)"
- ✅ "Verified by running:
npm test- all assertions passed" - ✅ "Evidence: coverage report shows 87% (above 80% target)"
Verification Checklist
Run through this checklist for every completed task:
1. Unit Tests
Command: Execute full unit test suite
# Examples by language
npm test # JavaScript/TypeScript
pytest # Python
go test ./... # Go
cargo test # Rust
mvn test # Java
bundle exec rspec # Ruby
Evidence Required:
- Exit code is 0
- All tests passed (X/X green)
- No skipped tests (or justify why)
- Duration reasonable (not timing out)
2. Test Coverage
Command: Generate coverage report
# Examples
npm run test:coverage # JavaScript
pytest --cov=src --cov-report=term
go test -cover ./...
cargo tarpaulin
Evidence Required:
- Coverage percentage meets target (≥80% by default)
- New code is covered (not just overall percentage)
- Critical paths at 100% coverage
3. Integration Tests
Command: Run integration test suite (if applicable)
npm run test:integration
pytest tests/integration/
go test -tags=integration ./...
Evidence Required:
- All integration tests pass
- External dependencies properly mocked/stubbed
- API contracts verified
4. Linting and Style
Command: Run linter
npm run lint # ESLint
pylint src/
golangci-lint run
cargo clippy
Evidence Required:
- Zero errors
- Zero warnings (or justify allowed warnings)
- Style rules enforced
5. Type Checking
Command: Run type checker (if typed language)
npx tsc --noEmit # TypeScript
mypy src/ # Python
# Go/Rust compile-time checked
Evidence Required:
- No type errors
- No implicit any (TypeScript)
- Type safety verified
6. Build Verification
Command: Run production build
npm run build
python setup.py bdist_wheel
go build ./...
cargo build --release
Evidence Required:
- Build succeeds (exit code 0)
- No warnings (or document allowed ones)
- Artifacts generated correctly
7. Functional Verification
Command: Manual execution/smoke test
# Start application
npm start
# Execute specific feature
curl http://localhost:3000/api/endpoint
# Or manual UI testing
Evidence Required:
- Feature works as specified
- Edge cases handled
- Error messages appropriate
- Performance acceptable
Verification Levels
Match verification rigor to project maturity:
Prototype Projects
Required:
- Unit tests for new code pass
- Feature demonstrably works
- No critical errors
Optional:
- Coverage targets
- Integration tests
- Full lint compliance
Development Projects
Required:
- All unit tests pass
- Coverage ≥80%
- Feature works as specified
- Linting clean
- Type checking passes
Optional:
- Integration tests (recommended)
- Performance benchmarks
Production Projects
Required:
- All unit tests pass
- All integration tests pass
- Coverage ≥80%
- All linting rules pass
- Type checking passes
- Build succeeds
- Feature verified manually
- Performance acceptable
Critical:
- Security scan clean
- No known vulnerabilities
- Breaking changes documented
Regulated Projects
Required: All production checks PLUS
- Coverage ≥95%
- Full audit trail
- Compliance checks pass
- Documentation complete
- Human approval obtained
Evidence Documentation
Document verification results clearly:
Good Example
## Verification Results
✅ Unit Tests: PASS
- Command: `npm test`
- Result: 47/47 tests passed
- Duration: 3.2s
- Exit code: 0
✅ Coverage: PASS
- Command: `npm run test:coverage`
- Result: 84% lines, 82% branches
- Target: 80%
- Evidence: coverage/index.html generated
✅ Linting: PASS
- Command: `npm run lint`
- Result: 0 errors, 0 warnings
- Exit code: 0
✅ Build: PASS
- Command: `npm run build`
- Result: dist/ generated successfully
- Size: 127KB (within budget)
✅ Manual Verification: PASS
- Tested user login flow
- Successful authentication
- Error handling verified
- Screenshot: verification/login-success.png
Bad Example
## Verification Results
Tests should pass because I wrote them correctly.
The code looks good and follows best practices.
Linting seems fine, no obvious errors.
Partial Verification is Not Verification
Do not claim:
- "95% of tests pass" → 5% failing means NOT VERIFIED
- "Most linting rules pass" → Any error means NOT VERIFIED
- "Works on my machine" → Must work in CI environment
- "Manual testing shows it works" → Need automated proof
Full verification required:
- 100% of tests pass (or blockers documented)
- Zero linting errors (or exceptions justified)
- Works in all required environments
- Both automated and manual checks pass
Handling Verification Failures
First Failure
- Read the full error message
- Identify root cause
- Fix the issue
- Re-verify completely
Second Failure
- Question the test (is it correct?)
- Question the specification (was it clear?)
- Try alternative implementation
- Re-verify completely
Third Failure
STOP. Escalate.
- Suspect architecture problem, not implementation
- Review design decisions
- Ask for human guidance
- Document the blocker
Alternative Verification Methods
When traditional tests are impractical:
Legacy Code (No Tests)
✅ Acceptable:
- Write characterization test for changed code
- Golden file comparison (snapshot)
- Manual verification checklist with screenshots
- Document what was verified manually
❌ Not Acceptable:
- Skip verification entirely
- Trust that nothing broke
- "Looks the same" without proof
UI/Visual Changes
✅ Acceptable:
- Visual regression tests (Percy, Chromatic)
- Screenshot comparison with before/after
- Manual checklist with documented steps
- Storybook component verification
❌ Not Acceptable:
- "UI looks fine to me"
- No before/after comparison
- No evidence captured
Infrastructure Changes
✅ Acceptable:
- Dry-run/plan preview
- Canary deployment verification
- Rollback test execution
- Drift detection reports
❌ Not Acceptable:
- "Should work in production"
- Untested deploy scripts
- No rollback plan
ML/AI Models
✅ Acceptable:
- Benchmark on test dataset
- Regression tests on known inputs
- Performance metrics comparison
- A/B test results
❌ Not Acceptable:
- "Seems to work well"
- No quantitative metrics
- Cherry-picked examples
Integration with Workflow
Phase 4.5: Verification (THIS SKILL)
- Apply evidence-based verification
- Document all verification results
- Block progression on failures
- No "should work" claims allowed
Phase 4: Implementation
- TDD skill ensures tests written first
- Tests must pass before marking complete
- This skill provides verification proof
Phase 5: Review
- Verification results inform review
- Evidence of correctness demonstrated
- Review focuses on quality, not correctness
Verification Commands by Language
JavaScript/TypeScript
# Tests
npm test
npm run test:watch
# Coverage
npm run test:coverage
npx jest --coverage
# Linting
npm run lint
npx eslint .
# Type checking
npx tsc --noEmit
# Build
npm run build
Python
# Tests
pytest
python -m pytest
# Coverage
pytest --cov=src --cov-report=html
coverage run -m pytest
# Linting
pylint src/
flake8 src/
ruff check .
# Type checking
mypy src/
# Build
python -m build
Go
# Tests
go test ./...
go test -v ./...
# Coverage
go test -cover ./...
go test -coverprofile=coverage.out ./...
# Linting
golangci-lint run
go vet ./...
# Build
go build ./...
Rust
# Tests
cargo test
cargo test --all-features
# Coverage
cargo tarpaulin --out Html
# Linting
cargo clippy
cargo clippy -- -D warnings
# Build
cargo build --release
Common Verification Failures
False Positives
Problem: Tests pass but feature doesn't work Cause: Tests don't actually test the requirement Solution: Review test validity, add missing assertions
Environment Differences
Problem: Passes locally, fails in CI Cause: Different dependencies, env vars, or configs Solution: Run tests in clean environment, match CI setup
Flaky Tests
Problem: Tests pass sometimes, fail randomly Cause: Timing issues, shared state, async problems Solution: Fix root cause before proceeding, never ignore
Coverage False Sense
Problem: High coverage but bugs exist Cause: Lines covered but assertions weak Solution: Review test quality, not just quantity
Verification Anti-Patterns
Trust Without Testing
❌ Anti-pattern:
"I ran the tests earlier and they passed,
so the current code should be fine."
✅ Correct:
"Running tests now to verify..."
[execute tests]
"Tests pass: 47/47 green, exit code 0"
Partial Test Runs
❌ Anti-pattern:
"I tested the specific function I changed,
that should be enough."
✅ Correct:
"Running full test suite to catch regressions..."
[execute all tests]
"Full suite passes, including integration tests"
Unverified Agent Claims
❌ Anti-pattern:
"The subagent reported that tests pass."
✅ Correct:
"Verifying subagent's claim by running tests..."
[execute tests directly]
"Confirmed: tests pass as reported"
Skipping Due to Fatigue
❌ Anti-pattern:
"We've been working on this for a while,
let's skip verification and move on."
✅ Correct:
"Long session requires careful verification.
Taking time to verify properly..."
[complete full verification]
"Verified before proceeding"
Checklist Before Declaring Complete
Before saying "implementation is complete":
- Unit tests executed and pass (100%)
- Coverage measured and meets target
- Integration tests pass (if applicable)
- Linting passes with zero errors
- Type checking passes (if applicable)
- Build succeeds
- Manual verification performed
- Performance acceptable
- Security considerations checked
- All evidence documented
- No "should work" language used
If ANY item unchecked → implementation NOT complete.
Summary
Remember:
- Evidence over assumptions
- Run actual commands, read actual output
- Never trust without verification
- Document all verification results
- Partial success is not success
- When verification fails repeatedly, escalate
- No "should work" - only "verified to work"
Verification is not optional. It's the gate between code and confidence.