Take SPEC.md + code + diff and produce a maximally ambitious, user-outcome-grounded qa-progress.json ready for /qa execution. Mock-aware coverage reality check — mocked tests don't count as real coverage. Derives scenarios directly from SPEC.md source material (user journeys, failure modes, acceptance criteria), synthesizes compositional user session scenarios, traces spec-to-code and code-to-spec, investigates code patterns, prioritizes, and writes the plan. No execution boundaries — plans everything worth testing including browser, Docker, and integration scenarios. Standalone or composable with /ship. Triggers: qa-plan, qa plan, test plan, derive qa scenarios, qa planning, plan qa, test scenarios.
From engnpx claudepluginhub inkeep/team-skills --plugin engThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.
Produce the most ambitious, user-outcome-grounded QA plan possible. Take spec.json, the codebase, and the diff — produce a qa-progress.json that captures every scenario a thorough human QA engineer would want to verify, directly consumable by /qa.
This skill bridges spec-derived scenarios (what the spec says to test) and code-derived scenarios (what the code actually needs tested). It does not execute tests — it produces the plan that /qa executes. It has no execution boundaries. /qa-plan plans everything worth testing. /qa handles execution feasibility — anything that can't be verified locally gets flagged as pending human verification, never silently dropped.
Every scenario must verify what the user actually experiences, not what the code does.
When writing scenarios, start from the user's perspective: what do they see, what do they experience, what outcome do they expect? Then trace backward to what needs to be verified to prove that outcome is real.
Inputs (any one is sufficient):
| Input | How to use it |
|---|---|
| SPEC.md path | Read it. Locate or derive the corresponding spec.json (in tmp/ship/spec.json). |
| spec.json path | Read it directly. Use userStories[] and their acceptance criteria as source material. |
| PR number | Run gh pr diff <number> and gh pr view <number>. Derive scope from the diff. Check for tmp/ship/spec.json in the working tree. |
| No input | Use git diff main...HEAD to determine scope. Check for tmp/ship/spec.json. |
Output: tmp/ship/qa-progress.json — scenarios derived from SPEC.md source material and code investigation, enriched with code-grounded fields, prioritized, and directly consumable by /qa.
This skill supports the cross-skill autonomy convention:
| Level | Behavior | How entered |
|---|---|---|
| Supervised (default) | Pause on contradictions between spec and code, ambiguous oracle types, or scenarios that may be out of scope. Present findings and ask before proceeding. | Default when standalone |
| Headless | Document contradictions and ambiguities in the plan's notes fields instead of pausing. Proceed through all gates autonomously. | --headless flag from orchestrator |
Headless mode adjustments:
contradiction gap type, not escalatedhuman with a note explaining whygrayArea flag rather than askingBefore starting any work, create a task for each step using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each step's own workflow text.
Mark each task in_progress when starting and completed when its step's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.
Determine what kind of project this is. The repo type drives which QA categories are relevant and what tools are expected during execution.
Detection signals:
| Signal | How to detect |
|---|---|
| Framework | Check package.json dependencies, config files (next.config.*, vite.config.*, tsconfig.json, Cargo.toml, go.mod, etc.) |
| Has UI | Presence of component files (.tsx, .jsx, .vue, .svelte), pages/ or app/ directories, CSS/styling files |
| Has API | Presence of route handlers, API directories, server files, OpenAPI specs |
| Has CLI | Presence of bin/ directory, CLI entry points, commander/yargs/clap dependencies |
| Has SDK/library | Presence of exports in package.json, public API surface, typedoc/jsdoc configs |
| Infrastructure only | No UI, API, CLI — only configs, skills, templates, CI files |
Category emphasis by repo type:
| Repo type | Primary emphasis | Secondary emphasis |
|---|---|---|
| Full-stack web app | All 8 categories equally | — |
| API-only service | error-state, edge-case, failure-mode, integration, cross-system | usability (API ergonomics, error messages) |
| CLI tool | ux-flow, error-state, edge-case, cross-system | usability (help text, flag behavior) |
| SDK / library | edge-case, integration, cross-system | ux-flow (developer experience, API surface) |
| Skill / template authoring | edge-case, integration | cross-system (composition with other skills) |
| Infrastructure / config | integration, cross-system | failure-mode (misconfiguration handling) |
Record the detected repo type. This table calibrates where to look hardest for scenarios. Generate scenarios for every category where a real user outcome exists — the table does not grant permission to skip categories. A CLI tool can still have usability scenarios. An SDK can still have UX-flow scenarios for the developer experience.
Load /worldmodel skill with the feature topic + SPEC path and/or PR number as user-provided sources. This maps the feature's blast radius before you start writing scenarios — what surfaces does it touch, who is affected, and what might break silently.
Read worldmodel's output. Extract these for use in Steps 2–5:
given clauses in reality.Pay special attention to silent impacts — changes that affect users without producing obvious errors or visible behavior changes (e.g., a caching change that subtly alters data freshness, a telemetry contract change that breaks downstream dashboards). Worldmodel's Personas section classifies propagation timing; look for items marked "silent." These need explicit QA scenarios because they won't produce obvious failures.
Output of this step: A topology map of surfaces, personas, and silent impacts. Feed this into Steps 2–5 — each surface without a covering scenario is a gap to fill.
Read SPEC.md directly for QA-relevant source material. Read spec.json for story structure. These are inputs to your thinking — not a template for your output.
Ignore spec.json's qaScenarios[]. Those scenarios are implementation context for iteration agents — they help implementers understand what will be verified. They are mechanical 1:1 mappings from spec sections and do not represent the scope or ambition of your plan. Derive your scenarios from SPEC.md source material directly. Do not seed from, pin to, or limit your plan based on spec.json's qaScenarios.
Output: Understanding of what the spec intends. Scenarios come in Step 2.5.
From the spec source material (Step 2) and repo type (Step 0), synthesize scenarios at two levels:
Atomic conditions — individual testable checks derived from spec sections. One per acceptance criterion, failure mode row, shadow path, or state matrix cell. These are the building blocks.
Compositional user journeys — realistic multi-step user sessions that cross multiple stories, surfaces, and states. For each primary persona:
Compositional scenarios are P0 by default — they test the integration seams that atomic checks miss.
Do not limit scenario count to the number of spec rows. The spec sections are source material, not a cardinality constraint. A spec with 3 acceptance criteria and 2 failure modes may produce 5 atomic scenarios and 4 compositional journeys.
Set source: "spec" on scenarios derived from SPEC.md content. Set source: "journey" on compositional scenarios that combine multiple spec elements into realistic user paths.
Output: Initial scenario list ready for code-grounding in Steps 3–5.
For each spec-derived scenario, verify it is grounded in the actual codebase. This catches stale specs, renamed routes, and missing implementations.
Route verification (for scenarios with a route field):
Component verification (for UI scenarios):
Affordance check (for behavior scenarios):
when clausethen clause is plausible given the codeGap types:
| Gap type | Meaning | Action |
|---|---|---|
fixable_gap | The scenario references something that exists but has moved (renamed route, refactored component) | Update the scenario with the current location |
contradiction | The spec says X but the code does Y (different behavior, different error message, etc.) | Supervised: pause and present the contradiction. Headless: document in notes, flag for human review |
stale | The scenario references something that no longer exists and has no replacement | Mark scenario as blocked with explanation |
Analyze the diff to find code changes that are NOT covered by any existing scenario. This is the reverse of Step 3 — catching things the spec author missed.
Diff analysis:
git diff main...HEAD --stat for file list, git diff main...HEAD for full changesgh pr diff <number>Map changed files to surfaces:
Create scenarios for uncovered changes: For each changed surface without a covering scenario:
QA-NNN idsource: "code" to distinguish from spec-derived scenariosuserOutcome first — what does the end user actually experience? Start from their perspective.given/when/then based on the code's behavior, grounded in the user outcomeverifies from the when/then clauses (what the test checks)tracesTo to the most relevant user story ID as a string (e.g., "US-003"), or omit when the mapping is unclearoracleType based on whether the expected behavior is deterministic (specified), relative (derived), or subjective (human)Go beyond the diff to discover testing concerns from code patterns and health signals.
Framework detection:
State management patterns:
Console / runtime signals:
console.error, console.warn patterns in changed codeCode health hotspots:
any types in the changed codeAdd new scenarios for uncovered concerns. Use the category emphasis from Step 0 for prioritization, but do not filter scenarios out.
For each scenario, determine whether it is already verified by a real (non-mocked) test — not whether it could be a formal test. The question is not "could someone write a test for this?" but "does a test already exist that proves the actual user outcome with real dependencies?"
Mock detection (critical): Search for existing tests covering each scenario's behavior. For each test found, check whether it mocks the service boundary:
jest.mock(), vi.mock(), or manual mock files for the module under testCoverage classification:
| Existing test status | Action |
|---|---|
| No test exists | Scenario stays. Set oracleType based on expected behavior type. |
| Test exists but mocks the service boundary | Scenario stays — mocked tests verify logic, not user outcomes. Note in enrichment.existingTestCoverage: "mocked". Optionally flag as formalizable if a real integration test would be valuable. |
| Test exists with real dependencies (real DB, real server, real API) | Scenario can be marked formalizable — real coverage exists. But keep it if the test doesn't cover the full user journey (e.g., tests the API but not the UI that calls it). |
| Test exists but only covers part of the scenario | Scenario stays for the uncovered portion. Note what's covered and what isn't. |
The default is: the scenario stays in the plan. Only mark a scenario formalizable and exclude it from QA execution when a real, non-mocked test already verifies the complete user outcome end-to-end.
Supervised mode: If many scenarios have only mocked test coverage, note this — it's a signal the test suite may need real integration tests. But this does not reduce the QA plan.
Headless mode: Document coverage findings in planMetadata.formalizableCount and proceed.
Assign priority tiers to every scenario. Priority drives execution order in /qa.
P0 — Must test (blocking):
Must requirements in the specP1 — Should test (important):
P2 — Could test (lower immediate impact):
Update each scenario's priority field.
Write the final enriched plan to tmp/ship/qa-progress.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).
qa-progress.json schema:
{
"specPath": "specs/feature-name/SPEC.md",
"prNumber": null,
"planMetadata": {
"repoType": "full-stack-web-app",
"generatedAt": "2026-03-24T12:00:00Z",
"totalScenarios": 12,
"byPriority": { "P0": 4, "P1": 5, "P2": 3 },
"bySource": { "spec": 5, "journey": 3, "code": 4 },
"formalizableCount": 2,
"mockedOnlyCoverageCount": 3
},
"scenarios": [
{
"id": "QA-001",
"priority": "P0",
"category": "ux-flow",
"name": "User can toggle task status from list",
"userOutcome": "User sees the task status change to 'in_progress' instantly in the UI without needing to refresh the page",
"verifies": "changing status to 'in_progress' via dropdown updates immediately without page refresh",
"given": "A task exists with status 'pending' on the task list",
"when": "User changes status to 'in_progress' via the dropdown",
"then": "Status updates immediately, UI reflects the change without page refresh",
"tracesTo": "US-003",
"oracleType": "specified",
"route": "/tasks",
"source": "spec",
"status": "planned",
"verifiedVia": null,
"notes": "",
"evidence": [], // populated by /qa: [{type: "assertion", check: "...", pass: true}, {type: "video", url: "..."}]
"enrichment": {
"codeLocation": "app/tasks/page.tsx:42",
"gapType": null,
"formalizable": false
}
}
]
}
Field definitions — top-level (compatible with /qa):
| Field | Description |
|---|---|
specPath | Path to SPEC.md. null if none. Required by /qa. |
prNumber | PR number. null if none. Required by /qa. |
Field definitions — planMetadata (enrichment by /qa-plan):
| Field | Description |
|---|---|
repoType | Detected repo type from Step 0 |
generatedAt | ISO 8601 timestamp |
totalScenarios | Count of all scenarios |
byPriority | Scenario count per priority tier |
bySource | Scenario count by origin (spec, journey, code) |
formalizableCount | Scenarios flagged as having real (non-mocked) test coverage |
mockedOnlyCoverageCount | Scenarios where existing tests only use mocked dependencies |
Field definitions — scenario (compatible with /qa's required fields):
| Field | Description |
|---|---|
userOutcome | What the end user actually experiences when this works correctly. Written from the user's perspective, not the code's. This is the north star for verification — /qa must prove this outcome is real. |
verifies | What the test checks — synthesized from given/when/then. Required by /qa. |
tracesTo | User story ID (e.g., "US-003"). Omit when relationship is fuzzy. Compatible with /qa's string type. |
Field definitions — scenario enrichment (new fields beyond /qa's schema):
| Field | Description |
|---|---|
given | Precondition (Gherkin-style, from spec or code investigation) |
when | Action (Gherkin-style) |
then | Expected outcome (Gherkin-style) |
oracleType | "specified" (deterministic), "derived" (compare to baseline), or "human" (requires judgment) |
source | "spec" (derived from SPEC.md content), "journey" (compositional user path combining multiple spec elements), or "code" (discovered during code tracing) |
status | "planned" (always, at this stage — /qa changes it during execution) |
verifiedVia | null (set by /qa during execution) |
notes | Empty string or gap/contradiction documentation |
evidence | Empty array. Populated by /qa with polymorphic proof items: {type: "video", url: "..."} for browser recordings, {type: "screenshot", url: "..."} for visual captures, {type: "assertion", check: "...", expected: "...", actual: "...", pass: true/false} for structured checks, {type: "command", cmd: "...", stdout: "...", pass: true/false} for shell evidence. Every validated/failed scenario should have at least one evidence item. |
enrichment.codeLocation | File and line where the scenario's behavior is implemented |
enrichment.gapType | null, "fixable_gap", "contradiction", or "stale" |
enrichment.formalizable | true if a real (non-mocked) test already covers this scenario's full user outcome |
enrichment.existingTestCoverage | null, "real" (non-mocked), or "mocked" (uses stubs/mocks at the service boundary) |
/qa-plan is the planner. /qa is the executor. The planner has no execution boundaries — it plans everything worth testing.
This skill does:
This skill does NOT:
/qa's jobExecution feasibility is /qa's problem, not /qa-plan's. If a scenario requires a real Stripe webhook, plan it. If a scenario requires browser automation, plan it. If a scenario requires Docker containers, plan it. /qa will attempt to execute everything locally (including spinning up Docker, using browser automation, writing scripts). Anything /qa genuinely cannot verify locally gets flagged as "blocked" with notes describing what was attempted and what a human needs to check — this flows to /pr as pending human verification. No scenario is silently dropped from the plan because it might be hard to execute.
Edge case rules:
stale but do not delete it — the spec author may need to update the spec