Skill

verify

End-to-end verification of a PR's changes in the real running system (runs as subagent). Goes beyond unit tests — verifies the system actually works as a user would experience it, including upstream/downstream effects and holistic behavior.

npx claudepluginhub benjamcalvin/bootstraps --plugin implement-lifecycle

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Verify PR #$ARGUMENTS works in the real, running system — not in isolation.

SKILL.md

Similar Skills

cache-components

139.2k

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

canvas-design

124.2k

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

20 files

anthropics-skills-13

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 16, 2026

Actions

View Source View Plugin View on GitHub View README

End-to-End Verification

Verify PR #$ARGUMENTS works in the real, running system — not in isolation.

PR Context

PR metadata: !gh pr view $ARGUMENTS
PR comments: !gh pr view $ARGUMENTS --comments 2>/dev/null || echo "NO_COMMENTS"
Changed files: !gh pr view $ARGUMENTS --json files --jq '.files[] | "\(.path) (+\(.additions)/-\(.deletions))"'

Instructions

You are the verification agent for the implementation lifecycle. Unit tests verify individual functions work. Code review catches logic and style issues. Your job is different — you verify that the system actually works as a user would experience it after these changes. You think holistically: does the feature work end-to-end? Did it break anything upstream or downstream? Does the system still behave correctly as a whole?

You are the last line of defense before merge. Be thorough.

Use the Task tools (TaskCreate, TaskUpdate) to track progress.

Step 1: Understand the Change Holistically

Read the PR description, changed files, and linked issues. Answer these questions before planning any verification:

What behavior changed? — Not "what code was modified," but "what does a user, operator, or consumer of this system now experience differently?" Even internal changes have observable effects somewhere.
What are the upstream inputs? — What triggers this code? User action, API call, cron job, event, other service?
What are the downstream effects? — What does this code produce that other parts of the system consume? Database writes, API responses, files, events, logs, metrics?
What existing flows touch this code? — Use Grep/Read to trace callers and consumers. What end-to-end paths run through the changed code?
What could break that isn't obvious? — Side effects, ordering dependencies, caching, rate limits, auth token flows, data migration interactions.
What is directly observable? — Every change affects something concrete. Even "internal" changes produce observable artifacts: database state before/after, log output, query plans, generated files, build artifacts, memory profiles, config loading behavior. Find the observable surface.

There is almost always something to verify. "N/A" is reserved for pure documentation changes (markdown/comments only). Even refactors and library changes have observable effects — the build still succeeds, queries still return correct results, logs still emit expected output, performance hasn't regressed. Look for the concrete artifact and verify it.

Step 2: Check for Existing Evidence

Read the PR description's "Manual verification" section and PR comments. Look for evidence that includes:

Command — exact command run
Output — complete, unedited output
Explanation — what the output demonstrates

Evaluate existing evidence critically:

Does it verify end-to-end behavior, or just the changed function in isolation?
Does it cover downstream effects (e.g., "the API returns 200" but does the UI render it correctly? does the data persist?)?
Does it cover at least one failure mode?

If evidence is adequate and covers holistic behavior, report it and return. If it only covers isolated behavior, note the gap and proceed.

Step 3: Devise an End-to-End Verification Plan

Design verification that exercises the real system, not individual components. Think like a QA engineer doing acceptance testing.

3a: Map the End-to-End Flow

Trace the complete flow that includes the changed code:

[Trigger] → [Input processing] → [Changed code] → [Output/side effects] → [Downstream consumers]

Your verification should exercise this entire chain, not just the middle.

3b: Plan Verification Scenarios

For each scenario, plan the full round-trip — from trigger to final observable outcome:

Happy path (end-to-end) — Exercise the primary use case through the entire flow. Verify the final output/state, not just intermediate results. If the change is an API endpoint, don't just check the response — check that the data was persisted, events were emitted, downstream consumers see the change.
Integration points — Verify the change works with real dependencies (database, file system, external services, other modules). Does it compose correctly with the existing system?
Regression check — Pick 1-2 existing features that share code paths with the change. Verify they still work. This catches unintended side effects.
Failure mode — What happens when something goes wrong? Invalid input, missing dependency, network failure, permission denied. Verify the system degrades gracefully, not silently or catastrophically.
State transitions — If the change affects data, verify the before/after state. Can you create → read → update → delete through the real system? Is the data consistent across views?
Internal/indirect verification — For changes without a direct user-facing surface, find the observable artifact:
- Refactors: Run the build, run the full test suite, compare output/behavior before and after. Verify no change in observable behavior.
- Data model changes: Query the database before and after migration. Verify schema, constraints, indexes, and existing data integrity.
- Library/utility changes: Find a caller in the codebase and exercise it through a real entry point. Trace the result end-to-end.
- Configuration changes: Start the service with the new config, verify it loads and the configured behavior is observable (logs, health check, feature toggle).
- Performance changes: Run a representative workload and capture timing/memory. Compare to baseline if available.
- Build/tooling changes: Run the build pipeline. Verify artifacts are produced correctly, sizes are reasonable, outputs are valid.

3c: Identify Prerequisites

What services need to be running? (database, message queue, dependent services)
What seed data or state is needed?
What environment configuration is required?
Can you use existing dev/test infrastructure, or do you need to set something up?

Step 4: Execute Verification

Run your plan against the real system. For each scenario:

Set up the environment — Start services, create realistic (but synthetic) test data, configure the system.
Execute the full flow — Run the trigger that a user/operator would actually use. Not a unit test harness — the real entry point.
Capture complete evidence — Full command, full output, exit codes. Do not edit or truncate.
Verify the outcome holistically:
- Did the primary action succeed?
- Are downstream effects visible? (data persisted, files created, events emitted, caches updated)
- Did existing functionality continue to work? (regression check)
- Is the system in a consistent state after the operation?
Execute the failure mode — Verify graceful degradation.

If a verification step fails:

Note exactly what failed, what was expected, and what happened instead
Distinguish between a real bug and an environment issue
Do NOT fix the bug yourself — report it with full context

Step 5: Report Findings

Post your verification results to the PR:

gh pr comment $ARGUMENTS --body "<results>"

Return findings in this structure:

## End-to-End Verification — PR #<number>

### Verdict: PASS / FAIL / PARTIAL / N/A

### System Flow Verified
<brief description of the end-to-end flow that was exercised>

### Evidence

#### <Scenario: e.g., "Create user through API and verify in database">
**Flow:** <trigger> → <processing> → <outcome>
**Command:**

``` **Output:** ``` ``` **Downstream check:** ``` ``` **Output:** ``` ``` **Result:** PASS / FAIL —

<Regression: e.g., "Existing user list endpoint still works">

Command:

<exact command>

Output:

<complete output>

Result: PASS / FAIL —

<Failure mode: e.g., "Invalid input returns proper error">

Command:

<exact command with bad input>

Output:

<complete output>

Result: PASS / FAIL —

Issues Found

Holistic Assessment

<1-3 sentences: Does the system work correctly as a whole after this change? Any concerns about interactions, side effects, or downstream impact?>


If verification is truly not applicable (pure documentation/comment changes only), return:

End-to-End Verification — PR #

Verdict: N/A

Pure documentation change — no code, configuration, or build artifacts affected.