Search everything...

Skill

qa-plan

Take SPEC.md + code + diff and produce a maximally ambitious, user-outcome-grounded qa-progress.json ready for /qa execution. Mock-aware coverage reality check — mocked tests don't count as real coverage. Derives scenarios directly from SPEC.md source material (user journeys, failure modes, acceptance criteria), synthesizes compositional user session scenarios, traces spec-to-code and code-to-spec, investigates code patterns, prioritizes, and writes the plan. No execution boundaries — plans everything worth testing including browser, Docker, and integration scenarios. Standalone or composable with /ship. Triggers: qa-plan, qa plan, test plan, derive qa scenarios, qa planning, plan qa, test scenarios.

From eng

Install

Run in your terminal

npx claudepluginhub inkeep/team-skills --plugin eng

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

138.7k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

cost-optimization

1 file

Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.

cloud-infrastructure

33.0k

Stats

Parent Repo Stars7

Parent Repo Forks1

Last CommitApr 4, 2026

Actions

View Source View Plugin View on GitHub View README

qa-plan | eng | ClaudePluginHub

Skill

qa-plan

From eng

Install

Run in your terminal

npx claudepluginhub inkeep/team-skills --plugin eng

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

QA Plan

Produce the most ambitious, user-outcome-grounded QA plan possible. Take spec.json, the codebase, and the diff — produce a qa-progress.json that captures every scenario a thorough human QA engineer would want to verify, directly consumable by /qa.

This skill bridges spec-derived scenarios (what the spec says to test) and code-derived scenarios (what the code actually needs tested). It does not execute tests — it produces the plan that /qa executes. It has no execution boundaries. /qa-plan plans everything worth testing. /qa handles execution feasibility — anything that can't be verified locally gets flagged as pending human verification, never silently dropped.

User Story Fidelity Principle

Every scenario must verify what the user actually experiences, not what the code does.

"The API returns 200" is not a scenario. "The user sees their dashboard load with real data" is.
"The test suite passes" is not coverage. A test with mocked providers verifies code logic, not user outcomes. If the only test coverage uses mocks, stubs, or fake implementations for the service boundary — the scenario is NOT covered and must remain in the plan.
"The webhook handler processes the payload" is not a scenario. "The user's payment status updates in real-time after Stripe confirms" is.

When writing scenarios, start from the user's perspective: what do they see, what do they experience, what outcome do they expect? Then trace backward to what needs to be verified to prove that outcome is real.

Inputs (any one is sufficient):

Input	How to use it
SPEC.md path	Read it. Locate or derive the corresponding `spec.json` (in `tmp/ship/spec.json`).
spec.json path	Read it directly. Use `userStories[]` and their acceptance criteria as source material.
PR number	Run `gh pr diff <number>` and `gh pr view <number>`. Derive scope from the diff. Check for `tmp/ship/spec.json` in the working tree.
No input	Use `git diff main...HEAD` to determine scope. Check for `tmp/ship/spec.json`.

Output: tmp/ship/qa-progress.json — scenarios derived from SPEC.md source material and code investigation, enriched with code-grounded fields, prioritized, and directly consumable by /qa.

Autonomy

This skill supports the cross-skill autonomy convention:

Level	Behavior	How entered
Supervised (default)	Pause on contradictions between spec and code, ambiguous oracle types, or scenarios that may be out of scope. Present findings and ask before proceeding.	Default when standalone
Headless	Document contradictions and ambiguities in the plan's `notes` fields instead of pausing. Proceed through all gates autonomously.	`--headless` flag from orchestrator

Headless mode adjustments:

Contradictions between spec and code: documented as contradiction gap type, not escalated
Ambiguous oracle types: default to human with a note explaining why
Scope-boundary scenarios: include with a grayArea flag rather than asking

Create workflow tasks (first action)

Before starting any work, create a task for each step using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each step's own workflow text.

QA-Plan: Detect repo type and gather inputs
QA-Plan: Enrich with /worldmodel — map surfaces, personas, and silent impacts
QA-Plan: Derive from spec source material
QA-Plan: Synthesize scenarios — atomic and compositional
QA-Plan: Trace spec to code
QA-Plan: Trace code to spec
QA-Plan: Code investigation
QA-Plan: Coverage reality check
QA-Plan: Prioritize
QA-Plan: Write qa-progress.json

Mark each task in_progress when starting and completed when its step's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.

Workflow

Step 0: Detect repo type

Determine what kind of project this is. The repo type drives which QA categories are relevant and what tools are expected during execution.

Detection signals:

Signal	How to detect
Framework	Check `package.json` dependencies, config files (`next.config.`, `vite.config.`, `tsconfig.json`, `Cargo.toml`, `go.mod`, etc.)
Has UI	Presence of component files (`.tsx`, `.jsx`, `.vue`, `.svelte`), pages/ or app/ directories, CSS/styling files
Has API	Presence of route handlers, API directories, server files, OpenAPI specs
Has CLI	Presence of bin/ directory, CLI entry points, commander/yargs/clap dependencies
Has SDK/library	Presence of exports in package.json, public API surface, typedoc/jsdoc configs
Infrastructure only	No UI, API, CLI — only configs, skills, templates, CI files

Category emphasis by repo type:

Repo type	Primary emphasis	Secondary emphasis
Full-stack web app	All 8 categories equally	—
API-only service	error-state, edge-case, failure-mode, integration, cross-system	usability (API ergonomics, error messages)
CLI tool	ux-flow, error-state, edge-case, cross-system	usability (help text, flag behavior)
SDK / library	edge-case, integration, cross-system	ux-flow (developer experience, API surface)
Skill / template authoring	edge-case, integration	cross-system (composition with other skills)
Infrastructure / config	integration, cross-system	failure-mode (misconfiguration handling)

Record the detected repo type. This table calibrates where to look hardest for scenarios. Generate scenarios for every category where a real user outcome exists — the table does not grant permission to skip categories. A CLI tool can still have usability scenarios. An SDK can still have UX-flow scenarios for the developer experience.

Step 1: Enrich with /worldmodel

Load /worldmodel skill with the feature topic + SPEC path and/or PR number as user-provided sources. This maps the feature's blast radius before you start writing scenarios — what surfaces does it touch, who is affected, and what might break silently.

Read worldmodel's output. Extract these for use in Steps 2–5:

Surfaces (product + internal) — transforms "what files changed" into "what surfaces are affected." Each surface is a candidate for QA scenarios.
Personas & Audiences — who is affected and how fast the change reaches them. Prioritize scenarios for personas with the widest blast radius.
Current State — how the system works today (code-verified). Grounds your given clauses in reality.
Patterns Observed — cross-channel divergences that may surface testing angles the SPEC/PR didn't mention.

Pay special attention to silent impacts — changes that affect users without producing obvious errors or visible behavior changes (e.g., a caching change that subtly alters data freshness, a telemetry contract change that breaks downstream dashboards). Worldmodel's Personas section classifies propagation timing; look for items marked "silent." These need explicit QA scenarios because they won't produce obvious failures.

Output of this step: A topology map of surfaces, personas, and silent impacts. Feed this into Steps 2–5 — each surface without a covering scenario is a gap to fill.

Step 2: Derive from spec source material

Read SPEC.md directly for QA-relevant source material. Read spec.json for story structure. These are inputs to your thinking — not a template for your output.

Ignore spec.json's qaScenarios[]. Those scenarios are implementation context for iteration agents — they help implementers understand what will be verified. They are mechanical 1:1 mappings from spec sections and do not represent the scope or ambition of your plan. Derive your scenarios from SPEC.md source material directly. Do not seed from, pin to, or limit your plan based on spec.json's qaScenarios.

Locate SPEC.md (from the provided path, or infer from spec.json's context)
If SPEC.md exists, read these sections:
- §5 User journeys — happy paths, failure & recovery paths, ongoing usage per persona
- §5 Interaction state matrix — feature states AND transitions between states
- §6 Acceptance criteria — each criterion is a testable condition
- §9 Data flow shadow paths — nil, empty, wrong type, timeout, conflict, partial failure
- §9 Failure modes table — individual failures, detection, recovery, user impact
- §9 Affected routes/pages — routes and what to verify
- §13 Deployment/rollout considerations
Read spec.json for story structure — which stories exist, their acceptance criteria, their ordering
If neither exists: "No spec-derived source material — plan will be code-driven only"

Output: Understanding of what the spec intends. Scenarios come in Step 2.5.

Step 2.5: Synthesize scenarios — atomic and compositional

From the spec source material (Step 2) and repo type (Step 0), synthesize scenarios at two levels:

Atomic conditions — individual testable checks derived from spec sections. One per acceptance criterion, failure mode row, shadow path, or state matrix cell. These are the building blocks.

Compositional user journeys — realistic multi-step user sessions that cross multiple stories, surfaces, and states. For each primary persona:

What does the first 10 minutes of using this feature look like end-to-end?
What multi-step workflows cross multiple stories? (e.g., filter → change status → verify filter still holds)
What happens when action A succeeds, then action B fails — does recovery from B break A's state?
What state transitions happen in real usage that aren't in the interaction state matrix?
What happens when the user navigates away mid-operation and returns?
What happens when two failures occur in sequence?

Compositional scenarios are P0 by default — they test the integration seams that atomic checks miss.

Do not limit scenario count to the number of spec rows. The spec sections are source material, not a cardinality constraint. A spec with 3 acceptance criteria and 2 failure modes may produce 5 atomic scenarios and 4 compositional journeys.

Set source: "spec" on scenarios derived from SPEC.md content. Set source: "journey" on compositional scenarios that combine multiple spec elements into realistic user paths.

Output: Initial scenario list ready for code-grounding in Steps 3–5.

Step 3: Trace spec to code

For each spec-derived scenario, verify it is grounded in the actual codebase. This catches stale specs, renamed routes, and missing implementations.

Route verification (for scenarios with a route field):

Glob/grep for the route in the codebase
If found: record the file path in the scenario's enrichment
If not found: mark as a gap (see gap types below)

Component verification (for UI scenarios):

Search for components, pages, or views referenced by the scenario
Verify they exist and are wired into the routing/rendering tree

Affordance check (for behavior scenarios):

Search for the function, handler, or endpoint that implements the behavior described in the when clause
Verify the expected behavior from the then clause is plausible given the code

Gap types:

Gap type	Meaning	Action
`fixable_gap`	The scenario references something that exists but has moved (renamed route, refactored component)	Update the scenario with the current location
`contradiction`	The spec says X but the code does Y (different behavior, different error message, etc.)	Supervised: pause and present the contradiction. Headless: document in `notes`, flag for human review
`stale`	The scenario references something that no longer exists and has no replacement	Mark scenario as `blocked` with explanation

Step 4: Trace code to spec

Analyze the diff to find code changes that are NOT covered by any existing scenario. This is the reverse of Step 3 — catching things the spec author missed.

Diff analysis:

Get the diff: git diff main...HEAD --stat for file list, git diff main...HEAD for full changes
If a PR number was provided: gh pr diff <number>

Map changed files to surfaces:

Changed component → visual / ux-flow scenario needed?
Changed API handler → error-state / edge-case scenario needed?
Changed data model → integration / failure-mode scenario needed?
Changed config / env → integration scenario needed?

Create scenarios for uncovered changes: For each changed surface without a covering scenario:

Create a new scenario with the next sequential QA-NNN id
Set source: "code" to distinguish from spec-derived scenarios
Write userOutcome first — what does the end user actually experience? Start from their perspective.
Write given/when/then based on the code's behavior, grounded in the user outcome
Synthesize verifies from the when/then clauses (what the test checks)
Set tracesTo to the most relevant user story ID as a string (e.g., "US-003"), or omit when the mapping is unclear
Set oracleType based on whether the expected behavior is deterministic (specified), relative (derived), or subjective (human)

Step 5: Code investigation

Go beyond the diff to discover testing concerns from code patterns and health signals.

Framework detection:

Identify the framework and its testing conventions (e.g., Next.js → check for server components, API routes, middleware)
Note framework-specific gotchas that affect QA (e.g., hydration mismatches, SSR vs CSR differences)

State management patterns:

Identify how state is managed (Redux, Zustand, React context, server state, URL params)
Each state management boundary is a potential source of sync bugs — flag untested transitions

Console / runtime signals:

Check for console.error, console.warn patterns in changed code
Check for error boundary components, try/catch blocks, error handlers
Each is a potential failure scenario if not already covered

Code health hotspots:

Files with high churn (many recent changes) touching the same feature
Complex conditional logic (deeply nested if/else, switch statements)
Type assertions or any types in the changed code
TODO/FIXME/HACK comments in the changed code

Add new scenarios for uncovered concerns. Use the category emphasis from Step 0 for prioritization, but do not filter scenarios out.

Step 6: Coverage reality check

For each scenario, determine whether it is already verified by a real (non-mocked) test — not whether it could be a formal test. The question is not "could someone write a test for this?" but "does a test already exist that proves the actual user outcome with real dependencies?"

Mock detection (critical): Search for existing tests covering each scenario's behavior. For each test found, check whether it mocks the service boundary:

Uses jest.mock(), vi.mock(), or manual mock files for the module under test
Uses MSW, nock, VCR cassettes, or similar to intercept HTTP
Uses stub/fake implementations of databases, queues, or external services
Uses test doubles that replace the real dependency behavior

Coverage classification:

Existing test status	Action
No test exists	Scenario stays. Set `oracleType` based on expected behavior type.
Test exists but mocks the service boundary	Scenario stays — mocked tests verify logic, not user outcomes. Note in `enrichment.existingTestCoverage`: `"mocked"`. Optionally flag as `formalizable` if a real integration test would be valuable.
Test exists with real dependencies (real DB, real server, real API)	Scenario can be marked `formalizable` — real coverage exists. But keep it if the test doesn't cover the full user journey (e.g., tests the API but not the UI that calls it).
Test exists but only covers part of the scenario	Scenario stays for the uncovered portion. Note what's covered and what isn't.

The default is: the scenario stays in the plan. Only mark a scenario formalizable and exclude it from QA execution when a real, non-mocked test already verifies the complete user outcome end-to-end.

Supervised mode: If many scenarios have only mocked test coverage, note this — it's a signal the test suite may need real integration tests. But this does not reduce the QA plan.

Headless mode: Document coverage findings in planMetadata.formalizableCount and proceed.

Step 7: Prioritize

Assign priority tiers to every scenario. Priority drives execution order in /qa.

P0 — Must test (blocking):

Happy-path scenarios for core user journeys
Error scenarios for data-loss or security-sensitive paths
Scenarios tracing to Must requirements in the spec
Any scenario where failure means the feature is fundamentally broken

P1 — Should test (important):

Error states and failure modes for non-critical paths
Edge cases for common user inputs
Visual correctness for primary surfaces
Integration scenarios with external dependencies

P2 — Could test (lower immediate impact):

Edge cases for uncommon inputs where the user outcome is real but failure has limited immediate impact
Cross-system scenarios where the affected surface is low-traffic
Performance-related observations

Update each scenario's priority field.

Step 8: Write qa-progress.json

Write the final enriched plan to tmp/ship/qa-progress.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).

qa-progress.json schema:

{
  "specPath": "specs/feature-name/SPEC.md",
  "prNumber": null,
  "planMetadata": {
    "repoType": "full-stack-web-app",
    "generatedAt": "2026-03-24T12:00:00Z",
    "totalScenarios": 12,
    "byPriority": { "P0": 4, "P1": 5, "P2": 3 },
    "bySource": { "spec": 5, "journey": 3, "code": 4 },
    "formalizableCount": 2,
    "mockedOnlyCoverageCount": 3
  },
  "scenarios": [
    {
      "id": "QA-001",
      "priority": "P0",
      "category": "ux-flow",
      "name": "User can toggle task status from list",
      "userOutcome": "User sees the task status change to 'in_progress' instantly in the UI without needing to refresh the page",
      "verifies": "changing status to 'in_progress' via dropdown updates immediately without page refresh",
      "given": "A task exists with status 'pending' on the task list",
      "when": "User changes status to 'in_progress' via the dropdown",
      "then": "Status updates immediately, UI reflects the change without page refresh",
      "tracesTo": "US-003",
      "oracleType": "specified",
      "route": "/tasks",
      "source": "spec",
      "status": "planned",
      "verifiedVia": null,
      "notes": "",
      "evidence": [],  // populated by /qa: [{type: "assertion", check: "...", pass: true}, {type: "video", url: "..."}]
      "enrichment": {
        "codeLocation": "app/tasks/page.tsx:42",
        "gapType": null,
        "formalizable": false
      }
    }
  ]
}

Field definitions — top-level (compatible with /qa):

Field	Description
`specPath`	Path to SPEC.md. `null` if none. Required by `/qa`.
`prNumber`	PR number. `null` if none. Required by `/qa`.

Field definitions — planMetadata (enrichment by /qa-plan):

Field	Description
`repoType`	Detected repo type from Step 0
`generatedAt`	ISO 8601 timestamp
`totalScenarios`	Count of all scenarios
`byPriority`	Scenario count per priority tier
`bySource`	Scenario count by origin (`spec`, `journey`, `code`)
`formalizableCount`	Scenarios flagged as having real (non-mocked) test coverage
`mockedOnlyCoverageCount`	Scenarios where existing tests only use mocked dependencies

Field definitions — scenario (compatible with /qa's required fields):

Field	Description
`userOutcome`	What the end user actually experiences when this works correctly. Written from the user's perspective, not the code's. This is the north star for verification — `/qa` must prove this outcome is real.
`verifies`	What the test checks — synthesized from `given`/`when`/`then`. Required by `/qa`.
`tracesTo`	User story ID (e.g., `"US-003"`). Omit when relationship is fuzzy. Compatible with `/qa`'s string type.

Field definitions — scenario enrichment (new fields beyond /qa's schema):

Field	Description
`given`	Precondition (Gherkin-style, from spec or code investigation)
`when`	Action (Gherkin-style)
`then`	Expected outcome (Gherkin-style)
`oracleType`	`"specified"` (deterministic), `"derived"` (compare to baseline), or `"human"` (requires judgment)
`source`	`"spec"` (derived from SPEC.md content), `"journey"` (compositional user path combining multiple spec elements), or `"code"` (discovered during code tracing)
`status`	`"planned"` (always, at this stage — /qa changes it during execution)
`verifiedVia`	`null` (set by /qa during execution)
`notes`	Empty string or gap/contradiction documentation
`evidence`	Empty array. Populated by /qa with polymorphic proof items: `{type: "video", url: "..."}` for browser recordings, `{type: "screenshot", url: "..."}` for visual captures, `{type: "assertion", check: "...", expected: "...", actual: "...", pass: true/false}` for structured checks, `{type: "command", cmd: "...", stdout: "...", pass: true/false}` for shell evidence. Every validated/failed scenario should have at least one evidence item.
`enrichment.codeLocation`	File and line where the scenario's behavior is implemented
`enrichment.gapType`	`null`, `"fixable_gap"`, `"contradiction"`, or `"stale"`
`enrichment.formalizable`	`true` if a real (non-mocked) test already covers this scenario's full user outcome
`enrichment.existingTestCoverage`	`null`, `"real"` (non-mocked), or `"mocked"` (uses stubs/mocks at the service boundary)

Skill scope boundaries

/qa-plan is the planner. /qa is the executor. The planner has no execution boundaries — it plans everything worth testing.

This skill does:

Read spec.json, code, and diffs
Create, enrich, and prioritize QA scenarios for every user outcome
Write qa-progress.json
Flag contradictions between spec and code
Identify scenarios with only mocked test coverage

This skill does NOT:

Execute any test scenario
Start the application or dev server
Modify application code
Filter scenarios based on execution feasibility — that's /qa's job

Execution feasibility is /qa's problem, not /qa-plan's. If a scenario requires a real Stripe webhook, plan it. If a scenario requires browser automation, plan it. If a scenario requires Docker containers, plan it. /qa will attempt to execute everything locally (including spinning up Docker, using browser automation, writing scripts). Anything /qa genuinely cannot verify locally gets flagged as "blocked" with notes describing what was attempted and what a human needs to check — this flows to /pr as pending human verification. No scenario is silently dropped from the plan because it might be hard to execute.

Edge case rules:

If a spec scenario is clearly wrong (references a deleted feature), mark it stale but do not delete it — the spec author may need to update the spec
If code introduces a security-sensitive change not in the spec, create a P0 scenario — safety overrides emphasis guidance
If the diff is empty (no changes from main), produce a minimal plan with only spec-derived scenarios and note "no code changes detected"

Similar Skills

cache-components

138.7k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

cost-optimization

1 file

Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.

cloud-infrastructure

33.0k

Stats

Parent Repo Stars7

Parent Repo Forks1

Last CommitApr 4, 2026

Actions

View Source View Plugin View on GitHub View README

QA Plan

User Story Fidelity Principle

Every scenario must verify what the user actually experiences, not what the code does.

"The API returns 200" is not a scenario. "The user sees their dashboard load with real data" is.
"The test suite passes" is not coverage. A test with mocked providers verifies code logic, not user outcomes. If the only test coverage uses mocks, stubs, or fake implementations for the service boundary — the scenario is NOT covered and must remain in the plan.
"The webhook handler processes the payload" is not a scenario. "The user's payment status updates in real-time after Stripe confirms" is.

Inputs (any one is sufficient):

Input	How to use it
SPEC.md path	Read it. Locate or derive the corresponding `spec.json` (in `tmp/ship/spec.json`).
spec.json path	Read it directly. Use `userStories[]` and their acceptance criteria as source material.
PR number	Run `gh pr diff <number>` and `gh pr view <number>`. Derive scope from the diff. Check for `tmp/ship/spec.json` in the working tree.
No input	Use `git diff main...HEAD` to determine scope. Check for `tmp/ship/spec.json`.

Output: tmp/ship/qa-progress.json — scenarios derived from SPEC.md source material and code investigation, enriched with code-grounded fields, prioritized, and directly consumable by /qa.

Autonomy

This skill supports the cross-skill autonomy convention:

Level	Behavior	How entered
Supervised (default)	Pause on contradictions between spec and code, ambiguous oracle types, or scenarios that may be out of scope. Present findings and ask before proceeding.	Default when standalone
Headless	Document contradictions and ambiguities in the plan's `notes` fields instead of pausing. Proceed through all gates autonomously.	`--headless` flag from orchestrator

Headless mode adjustments:

Contradictions between spec and code: documented as contradiction gap type, not escalated
Ambiguous oracle types: default to human with a note explaining why
Scope-boundary scenarios: include with a grayArea flag rather than asking

Create workflow tasks (first action)

Before starting any work, create a task for each step using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each step's own workflow text.

QA-Plan: Detect repo type and gather inputs
QA-Plan: Enrich with /worldmodel — map surfaces, personas, and silent impacts
QA-Plan: Derive from spec source material
QA-Plan: Synthesize scenarios — atomic and compositional
QA-Plan: Trace spec to code
QA-Plan: Trace code to spec
QA-Plan: Code investigation
QA-Plan: Coverage reality check
QA-Plan: Prioritize
QA-Plan: Write qa-progress.json

Mark each task in_progress when starting and completed when its step's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.

Workflow

Step 0: Detect repo type

Determine what kind of project this is. The repo type drives which QA categories are relevant and what tools are expected during execution.

Detection signals:

Signal	How to detect
Framework	Check `package.json` dependencies, config files (`next.config.`, `vite.config.`, `tsconfig.json`, `Cargo.toml`, `go.mod`, etc.)
Has UI	Presence of component files (`.tsx`, `.jsx`, `.vue`, `.svelte`), pages/ or app/ directories, CSS/styling files
Has API	Presence of route handlers, API directories, server files, OpenAPI specs
Has CLI	Presence of bin/ directory, CLI entry points, commander/yargs/clap dependencies
Has SDK/library	Presence of exports in package.json, public API surface, typedoc/jsdoc configs
Infrastructure only	No UI, API, CLI — only configs, skills, templates, CI files

Category emphasis by repo type:

Repo type	Primary emphasis	Secondary emphasis
Full-stack web app	All 8 categories equally	—
API-only service	error-state, edge-case, failure-mode, integration, cross-system	usability (API ergonomics, error messages)
CLI tool	ux-flow, error-state, edge-case, cross-system	usability (help text, flag behavior)
SDK / library	edge-case, integration, cross-system	ux-flow (developer experience, API surface)
Skill / template authoring	edge-case, integration	cross-system (composition with other skills)
Infrastructure / config	integration, cross-system	failure-mode (misconfiguration handling)

Step 1: Enrich with /worldmodel

Read worldmodel's output. Extract these for use in Steps 2–5:

Surfaces (product + internal) — transforms "what files changed" into "what surfaces are affected." Each surface is a candidate for QA scenarios.
Personas & Audiences — who is affected and how fast the change reaches them. Prioritize scenarios for personas with the widest blast radius.
Current State — how the system works today (code-verified). Grounds your given clauses in reality.
Patterns Observed — cross-channel divergences that may surface testing angles the SPEC/PR didn't mention.

Output of this step: A topology map of surfaces, personas, and silent impacts. Feed this into Steps 2–5 — each surface without a covering scenario is a gap to fill.

Step 2: Derive from spec source material

Read SPEC.md directly for QA-relevant source material. Read spec.json for story structure. These are inputs to your thinking — not a template for your output.

Locate SPEC.md (from the provided path, or infer from spec.json's context)
If SPEC.md exists, read these sections:
- §5 User journeys — happy paths, failure & recovery paths, ongoing usage per persona
- §5 Interaction state matrix — feature states AND transitions between states
- §6 Acceptance criteria — each criterion is a testable condition
- §9 Data flow shadow paths — nil, empty, wrong type, timeout, conflict, partial failure
- §9 Failure modes table — individual failures, detection, recovery, user impact
- §9 Affected routes/pages — routes and what to verify
- §13 Deployment/rollout considerations
Read spec.json for story structure — which stories exist, their acceptance criteria, their ordering
If neither exists: "No spec-derived source material — plan will be code-driven only"

Output: Understanding of what the spec intends. Scenarios come in Step 2.5.

Step 2.5: Synthesize scenarios — atomic and compositional

From the spec source material (Step 2) and repo type (Step 0), synthesize scenarios at two levels:

Atomic conditions — individual testable checks derived from spec sections. One per acceptance criterion, failure mode row, shadow path, or state matrix cell. These are the building blocks.

Compositional user journeys — realistic multi-step user sessions that cross multiple stories, surfaces, and states. For each primary persona:

What does the first 10 minutes of using this feature look like end-to-end?
What multi-step workflows cross multiple stories? (e.g., filter → change status → verify filter still holds)
What happens when action A succeeds, then action B fails — does recovery from B break A's state?
What state transitions happen in real usage that aren't in the interaction state matrix?
What happens when the user navigates away mid-operation and returns?
What happens when two failures occur in sequence?

Compositional scenarios are P0 by default — they test the integration seams that atomic checks miss.

Set source: "spec" on scenarios derived from SPEC.md content. Set source: "journey" on compositional scenarios that combine multiple spec elements into realistic user paths.

Output: Initial scenario list ready for code-grounding in Steps 3–5.

Step 3: Trace spec to code

For each spec-derived scenario, verify it is grounded in the actual codebase. This catches stale specs, renamed routes, and missing implementations.

Route verification (for scenarios with a route field):

Glob/grep for the route in the codebase
If found: record the file path in the scenario's enrichment
If not found: mark as a gap (see gap types below)

Component verification (for UI scenarios):

Search for components, pages, or views referenced by the scenario
Verify they exist and are wired into the routing/rendering tree

Affordance check (for behavior scenarios):

Search for the function, handler, or endpoint that implements the behavior described in the when clause
Verify the expected behavior from the then clause is plausible given the code

Gap types:

Gap type	Meaning	Action
`fixable_gap`	The scenario references something that exists but has moved (renamed route, refactored component)	Update the scenario with the current location
`contradiction`	The spec says X but the code does Y (different behavior, different error message, etc.)	Supervised: pause and present the contradiction. Headless: document in `notes`, flag for human review
`stale`	The scenario references something that no longer exists and has no replacement	Mark scenario as `blocked` with explanation

Step 4: Trace code to spec

Analyze the diff to find code changes that are NOT covered by any existing scenario. This is the reverse of Step 3 — catching things the spec author missed.

Diff analysis:

Get the diff: git diff main...HEAD --stat for file list, git diff main...HEAD for full changes
If a PR number was provided: gh pr diff <number>

Map changed files to surfaces:

Changed component → visual / ux-flow scenario needed?
Changed API handler → error-state / edge-case scenario needed?
Changed data model → integration / failure-mode scenario needed?
Changed config / env → integration scenario needed?

Create scenarios for uncovered changes: For each changed surface without a covering scenario:

Create a new scenario with the next sequential QA-NNN id
Set source: "code" to distinguish from spec-derived scenarios
Write userOutcome first — what does the end user actually experience? Start from their perspective.
Write given/when/then based on the code's behavior, grounded in the user outcome
Synthesize verifies from the when/then clauses (what the test checks)
Set tracesTo to the most relevant user story ID as a string (e.g., "US-003"), or omit when the mapping is unclear
Set oracleType based on whether the expected behavior is deterministic (specified), relative (derived), or subjective (human)

Step 5: Code investigation

Go beyond the diff to discover testing concerns from code patterns and health signals.

Framework detection:

Identify the framework and its testing conventions (e.g., Next.js → check for server components, API routes, middleware)
Note framework-specific gotchas that affect QA (e.g., hydration mismatches, SSR vs CSR differences)

State management patterns:

Identify how state is managed (Redux, Zustand, React context, server state, URL params)
Each state management boundary is a potential source of sync bugs — flag untested transitions

Console / runtime signals:

Check for console.error, console.warn patterns in changed code
Check for error boundary components, try/catch blocks, error handlers
Each is a potential failure scenario if not already covered

Code health hotspots:

Files with high churn (many recent changes) touching the same feature
Complex conditional logic (deeply nested if/else, switch statements)
Type assertions or any types in the changed code
TODO/FIXME/HACK comments in the changed code

Add new scenarios for uncovered concerns. Use the category emphasis from Step 0 for prioritization, but do not filter scenarios out.

Step 6: Coverage reality check

Mock detection (critical): Search for existing tests covering each scenario's behavior. For each test found, check whether it mocks the service boundary:

Uses jest.mock(), vi.mock(), or manual mock files for the module under test
Uses MSW, nock, VCR cassettes, or similar to intercept HTTP
Uses stub/fake implementations of databases, queues, or external services
Uses test doubles that replace the real dependency behavior

Coverage classification:

Existing test status	Action
No test exists	Scenario stays. Set `oracleType` based on expected behavior type.
Test exists but mocks the service boundary	Scenario stays — mocked tests verify logic, not user outcomes. Note in `enrichment.existingTestCoverage`: `"mocked"`. Optionally flag as `formalizable` if a real integration test would be valuable.
Test exists with real dependencies (real DB, real server, real API)	Scenario can be marked `formalizable` — real coverage exists. But keep it if the test doesn't cover the full user journey (e.g., tests the API but not the UI that calls it).
Test exists but only covers part of the scenario	Scenario stays for the uncovered portion. Note what's covered and what isn't.

Supervised mode: If many scenarios have only mocked test coverage, note this — it's a signal the test suite may need real integration tests. But this does not reduce the QA plan.

Headless mode: Document coverage findings in planMetadata.formalizableCount and proceed.

Step 7: Prioritize

Assign priority tiers to every scenario. Priority drives execution order in /qa.

P0 — Must test (blocking):

Happy-path scenarios for core user journeys
Error scenarios for data-loss or security-sensitive paths
Scenarios tracing to Must requirements in the spec
Any scenario where failure means the feature is fundamentally broken

P1 — Should test (important):

Error states and failure modes for non-critical paths
Edge cases for common user inputs
Visual correctness for primary surfaces
Integration scenarios with external dependencies

P2 — Could test (lower immediate impact):

Edge cases for uncommon inputs where the user outcome is real but failure has limited immediate impact
Cross-system scenarios where the affected surface is low-traffic
Performance-related observations

Update each scenario's priority field.

Step 8: Write qa-progress.json

Write the final enriched plan to tmp/ship/qa-progress.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).

qa-progress.json schema:

{
  "specPath": "specs/feature-name/SPEC.md",
  "prNumber": null,
  "planMetadata": {
    "repoType": "full-stack-web-app",
    "generatedAt": "2026-03-24T12:00:00Z",
    "totalScenarios": 12,
    "byPriority": { "P0": 4, "P1": 5, "P2": 3 },
    "bySource": { "spec": 5, "journey": 3, "code": 4 },
    "formalizableCount": 2,
    "mockedOnlyCoverageCount": 3
  },
  "scenarios": [
    {
      "id": "QA-001",
      "priority": "P0",
      "category": "ux-flow",
      "name": "User can toggle task status from list",
      "userOutcome": "User sees the task status change to 'in_progress' instantly in the UI without needing to refresh the page",
      "verifies": "changing status to 'in_progress' via dropdown updates immediately without page refresh",
      "given": "A task exists with status 'pending' on the task list",
      "when": "User changes status to 'in_progress' via the dropdown",
      "then": "Status updates immediately, UI reflects the change without page refresh",
      "tracesTo": "US-003",
      "oracleType": "specified",
      "route": "/tasks",
      "source": "spec",
      "status": "planned",
      "verifiedVia": null,
      "notes": "",
      "evidence": [],  // populated by /qa: [{type: "assertion", check: "...", pass: true}, {type: "video", url: "..."}]
      "enrichment": {
        "codeLocation": "app/tasks/page.tsx:42",
        "gapType": null,
        "formalizable": false
      }
    }
  ]
}

Field definitions — top-level (compatible with /qa):

Field	Description
`specPath`	Path to SPEC.md. `null` if none. Required by `/qa`.
`prNumber`	PR number. `null` if none. Required by `/qa`.

Field definitions — planMetadata (enrichment by /qa-plan):

Field	Description
`repoType`	Detected repo type from Step 0
`generatedAt`	ISO 8601 timestamp
`totalScenarios`	Count of all scenarios
`byPriority`	Scenario count per priority tier
`bySource`	Scenario count by origin (`spec`, `journey`, `code`)
`formalizableCount`	Scenarios flagged as having real (non-mocked) test coverage
`mockedOnlyCoverageCount`	Scenarios where existing tests only use mocked dependencies

Field definitions — scenario (compatible with /qa's required fields):

Field	Description
`userOutcome`	What the end user actually experiences when this works correctly. Written from the user's perspective, not the code's. This is the north star for verification — `/qa` must prove this outcome is real.
`verifies`	What the test checks — synthesized from `given`/`when`/`then`. Required by `/qa`.
`tracesTo`	User story ID (e.g., `"US-003"`). Omit when relationship is fuzzy. Compatible with `/qa`'s string type.

Field definitions — scenario enrichment (new fields beyond /qa's schema):

Field	Description
`given`	Precondition (Gherkin-style, from spec or code investigation)
`when`	Action (Gherkin-style)
`then`	Expected outcome (Gherkin-style)
`oracleType`	`"specified"` (deterministic), `"derived"` (compare to baseline), or `"human"` (requires judgment)
`source`	`"spec"` (derived from SPEC.md content), `"journey"` (compositional user path combining multiple spec elements), or `"code"` (discovered during code tracing)
`status`	`"planned"` (always, at this stage — /qa changes it during execution)
`verifiedVia`	`null` (set by /qa during execution)
`notes`	Empty string or gap/contradiction documentation
`evidence`	Empty array. Populated by /qa with polymorphic proof items: `{type: "video", url: "..."}` for browser recordings, `{type: "screenshot", url: "..."}` for visual captures, `{type: "assertion", check: "...", expected: "...", actual: "...", pass: true/false}` for structured checks, `{type: "command", cmd: "...", stdout: "...", pass: true/false}` for shell evidence. Every validated/failed scenario should have at least one evidence item.
`enrichment.codeLocation`	File and line where the scenario's behavior is implemented
`enrichment.gapType`	`null`, `"fixable_gap"`, `"contradiction"`, or `"stale"`
`enrichment.formalizable`	`true` if a real (non-mocked) test already covers this scenario's full user outcome
`enrichment.existingTestCoverage`	`null`, `"real"` (non-mocked), or `"mocked"` (uses stubs/mocks at the service boundary)

Skill scope boundaries

/qa-plan is the planner. /qa is the executor. The planner has no execution boundaries — it plans everything worth testing.

This skill does:

Read spec.json, code, and diffs
Create, enrich, and prioritize QA scenarios for every user outcome
Write qa-progress.json
Flag contradictions between spec and code
Identify scenarios with only mocked test coverage

This skill does NOT:

Execute any test scenario
Start the application or dev server
Modify application code
Filter scenarios based on execution feasibility — that's /qa's job

Edge case rules:

If a spec scenario is clearly wrong (references a deleted feature), mark it stale but do not delete it — the spec author may need to update the spec
If code introduces a security-sensitive change not in the spec, create a P0 scenario — safety overrides emphasis guidance
If the diff is empty (no changes from main), produce a minimal plan with only spec-derived scenarios and note "no code changes detected"