REQUIRED Phase 7 of /dev workflow (final). Requires fresh runtime evidence before claiming completion.
/plugin marketplace add edwinhu/workflows/plugin install workflows@edwinhu-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Announce: "I'm using dev-verify (Phase 7) to confirm completion with fresh evidence."
The automated test IS your deliverable. The implementation just makes the test pass.
Reframe your task:
The test proves value. The implementation is a means to an end.
Without a REAL automated test (executes code, verifies behavior), you have delivered NOTHING. </EXTREMELY-IMPORTANT>
<EXTREMELY-IMPORTANT> ## The Iron Law of VerificationNO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE. This is not negotiable.
Before claiming ANYTHING is complete, you MUST:
This applies even when:
If you catch yourself about to claim completion without fresh evidence, STOP. </EXTREMELY-IMPORTANT>
| Thought | Why It's Wrong | Do Instead |
|---|---|---|
| "It should work" | "Should" isn't evidence | Run the command |
| "I'm pretty sure it passes" | Confidence isn't verification | Run the command |
| "The agent reported success" | Agent reports need confirmation | Run it yourself |
| "I ran it earlier" | Earlier isn't fresh | Run it again |
| "The code exists" | Existing ≠ working | Run and check output |
| "Grep shows the function" | Pattern match ≠ runtime test | Run the function |
Before making ANY status claim:
1. IDENTIFY → Which command proves your assertion?
2. RUN → Execute the command fresh
3. READ → Review full output and exit codes
4. VERIFY → Confirm output supports your claim
5. CLAIM → Only after steps 1-4
Skipping any step is dishonest, not verification.
| Claim | Required Evidence |
|---|---|
| "Tests pass" | Test output showing 0 failures |
| "Build succeeds" | Exit code 0 from build command |
| "Linter clean" | Linter output showing 0 errors |
| "Bug fixed" | Test that failed now passes |
| "Feature complete" | All acceptance criteria verified |
| "User-facing feature works" | E2E test output showing PASS |
USER-FACING CLAIMS REQUIRE E2E EVIDENCE. Unit tests are insufficient.
| Claim | Unit Test Evidence | E2E Evidence Required |
|---|---|---|
| "API works" | ❌ Insufficient | ✅ Full request/response test |
| "UI renders" | ❌ Insufficient | ✅ Playwright snapshot/interaction |
| "Feature complete" | ❌ Insufficient | ✅ User flow simulation |
| "No regressions" | ❌ Insufficient | ✅ E2E suite passes |
These are NOT E2E tests. They are observability, not verification.
| ❌ Fake E2E | ✅ Real E2E |
|---|---|
| "Log shows function was called" | "Screenshot shows correct UI rendered" |
| "grep papirus in logs" | "grim screenshot + visual diff confirms icon changed" |
| "Console output contains 'success'" | "Playwright assertion: element.textContent === 'Success'" |
| "File was created" | "E2E test opens file and verifies contents" |
| "Process exited 0" | "Functional test verifies actual output matches spec" |
| "Mock returned expected value" | "Real integration returns expected value" |
Red Flag: If you catch yourself thinking "logs prove it works" - STOP. Logs prove code executed, not that it produced correct results. E2E means verifying the actual output users see.
| Thought | Reality |
|---|---|
| "Unit tests cover it" | Unit tests don't simulate users. Where's E2E? |
| "E2E would be redundant" | Redundancy catches bugs. Write E2E. |
| "No time for E2E" | No time to fix production bugs? Write E2E. |
| "Feature is internal" | Does it affect user output? Then E2E. |
| "I manually tested" | Manual = no evidence. Automate it. |
| "Log checking verifies it works" | Log checking only verifies code executed, not results. Not E2E. |
| "E2E with screenshots is too complex" | If too complex to verify, feature isn't done. Complexity = bugs hiding. |
| "Implementation is done, testing is just verification" | Testing IS implementation. Untested code is unfinished code. |
For user-facing changes, add to verification:
1. IDENTIFY → Which E2E test proves user-facing behavior?
2. RUN → Execute E2E test fresh
3. READ → Review full output (screenshots if visual)
4. VERIFY → User flow works as specified
5. CLAIM → Only after E2E evidence exists
"Unit tests pass" is not "feature complete" for user-facing changes. </EXTREMELY-IMPORTANT>
These do NOT count as verification:
When you say "Feature complete", you are asserting:
Saying "complete" based on stale data or agent reports is not "summarizing" - it is LYING about project state.
"Still verifying" is honest. "Complete" without evidence is fraud. </EXTREMELY-IMPORTANT>
These thoughts mean STOP—you're about to claim falsely:
| Thought | Reality |
|---|---|
| "I just ran it" | "Just" = stale. Run it AGAIN. |
| "The agent said it passed" | Agent reports need YOUR confirmation. Run it. |
| "It should work" | "Should" is hope. Run and see output. |
| "I'm confident" | Confidence ≠ verification. Run the command. |
| "We already verified earlier" | Earlier ≠ now. Fresh evidence only. |
| "User will verify it" | NO. Verify before claiming. User trusts your claim. |
| "Close enough" | Close ≠ complete. Verify fully. |
| "Time to move on" | Only move on after FRESH verification. |
STRUCTURAL VERIFICATION IS NOT RUNTIME VERIFICATION:
| ❌ NOT Verification | ✅ IS Verification |
|---|---|
| "Code exists in file" | "Code ran and produced output X" |
| "Function is defined" | "Function was called and returned Y" |
| "Grep found the pattern" | "Program output shows expected behavior" |
| "ast-grep found the code" | "Test executed and passed with output" |
| "Diff shows the change" | "Change tested with actual input/output" |
| "Implementation looks correct" | "Ran test, saw PASS in logs" |
The key difference:
If you find yourself saying "the code exists" or "I verified the implementation" without running it, STOP - that's not verification.
# Run the command
npm test # or pytest, cargo test, etc.
# See numerical results
# "34/34 pass" → Can claim "tests pass"
# "33/34 pass" → CANNOT claim success
# 1. Write test → run (should pass)
# 2. Revert fix → run (MUST fail)
# 3. Restore fix → run (should pass)
# Only then claim "bug is fixed"
npm run build && echo "Exit code: $?"
# Must see "Exit code: 0" to claim success
After technical verification passes, confirm with user:
AskUserQuestion:
question: "Does this implementation meet your requirements?"
options:
- label: "Yes, requirements met"
description: "Feature works as designed, ready to merge"
- label: "Partially"
description: "Core works but missing some requirements"
- label: "No"
description: "Does not meet requirements, needs more work"
Reference .claude/SPEC.md when asking - remind user of the success criteria they defined.
If user says "Partially" or "No":
/dev-implement to address gapsOnly claim COMPLETE when:
Two types of verification required:
Both must pass. No shortcuts exist.
When user confirms "Yes, requirements met":
Announce: "Dev workflow complete. All 7 phases passed."
The /dev workflow is now finished. Offer to:
.claude/ files/devThis skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.