Help us improve
Share bugs, ideas, or general feedback.
From prd2impl
Milestone gate verification — run automated checks and produce a structured pass/fail report for a milestone. Use when the user says 'smoke test', 'verify milestone', 'M1 gate check', or runs /smoke-test.
npx claudepluginhub ezagent42/prd2impl --plugin prd2implHow this skill is triggered — by the user, by Claude, or both
Slash command
/prd2impl:skill-10-smoke-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<SUBAGENT-STOP>
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Run milestone gate verification: check task completion, run automated tests, verify artifacts, and produce a structured go/no-go report.
/smoke-test {milestone} (e.g., /smoke-test M1)M0, M1, M2){plans_dir}/*-execution-plan.yaml (milestone definitions, gate checks){plans_dir}/tasks.yaml or {plans_dir}/task-status.md (task statuses).artifacts/registry.json (artifact completeness)Path resolution: Before constructing any read path, resolve
{plans_dir}perlib/plans-dir-resolver.md. Alldocs/plans/references (exceptdocs/plans/project.yaml, which stays at repo root) are relative to that resolved directory..artifacts/paths are NOT scoped — they remain shared across plans_dir (see design spec §8 Limitation 1).
dev-loop-skills:skill-0-project-builder maintains a
baseline_commit frontmatter on the project's "Skill 1" knowledge
file plus a self-update.sh --check script returning drift count
(new top-level modules, renamed dirs, etc. since the last bootstrap).
Before running any milestone test verification, check it:
/project-builder self-update --check
Gate rule:
drift_count > 50 (configurable via {plans_dir}/project.yaml::drift_threshold)
→ emit a STAGED warning prompting user to run /bootstrap re-baseline
before gate close. NOT an automatic NO-GO — drift can be intentional.drift_count <= 50 → proceed silently.Why: a stale module map silently passes milestone gates against
imagined code structure. PV2 shipped pipeline_v2/kb_mcp/ because
the planning step didn't know cc_pool.py:691 already auto-injects
an MCP server. A re-baseline before PV2 task-gen would have surfaced
the duplication.
Graceful degradation: when dev-loop-skills missing, skip with a
logged warning, gate proceeds.
Verify all tasks in this milestone's phase are completed:
## Task Completion — M1
| Task | Name | Status | Result |
|------|------|--------|--------|
| T1A.1 | Mode/Gate | 🟩 | ✅ Pass |
| T1A.2 | Timer | 🟩 | ✅ Pass |
| T1A.3 | EventBus | 🟩 | ✅ Pass |
| T1B.2 | Message UI | 🟦 | ❌ Still in progress |
Result: 16/17 complete — FAIL (1 task remaining)
If any tasks are not complete, report and ask whether to proceed with partial verification.
Triggers when: at least one task in the milestone has source_plan_path in its tasks.yaml entry.
For each such task, read the matching task-hints.yaml entries (rich per-plan-task data) to extract per-plan-task files.create + files.modify lists. If task-hints.yaml is missing or out of sync, re-parse the plan with skill-0-ingest/lib/plan-parser.md (Rule 3) as a fallback. Cross-check against actual git history.
The report breaks down to per-PLAN-TASK rows (e.g. T1 / plan-task-1, T1 / plan-task-2) for granularity, even though the prd2impl task is plan-FILE level. This is the "richness preserved" half of the plan-passthrough deal — task_hints.yaml is the source of truth for that richness.
For each prd2impl task T with source_plan_path = P:
task_hints.tasks[] entries whose source_plan_path == P (these are the plan-tasks within this prd2impl task).pt (1-based index i), record:
declared_create[T/pt_i] = pt.files.createdeclared_modify[T/pt_i] = pt.files.modifyIf task-hints.yaml cannot be located (e.g. ingest-docs was never run or task-hints was deleted), parse P directly with plan-parser and use the parsed tasks[]. Surface a WARN: "task-hints.yaml not found for {P}; re-parsed plan at smoke-test time (slower; please regenerate via /ingest-docs)."
git diff --name-status {base_branch}...HEAD | awk '$1 == "A" { print $2 }' # actually created
git diff --name-status {base_branch}...HEAD | awk '$1 == "M" { print $2 }' # actually modified
Define:
actual_create = the "A" setactual_modify = the "M" setThe actual set is computed ONCE for the whole milestone (not per-plan-task) — it's the cumulative diff vs the milestone's base branch.
For each T/pt_i row built in Step 2.5.1:
| Delta | Definition | Severity |
|---|---|---|
missing_create | declared_create[T/pt_i] ∩ NOT(actual_create) | NO-GO (declared file does not exist) |
unexpected_create | actual_create - ⋃(declared_create across all T/pt_i) | WARN (file created outside any plan; reported ONCE at milestone level, not per-plan-task) |
declared_modify_not_modified | declared_modify[T/pt_i] ∩ NOT(actual_modify ∪ actual_create) | NO-GO (plan said to modify but no diff) |
unexpected_modify | actual_modify - ⋃(declared_modify across all T/pt_i) | WARN (modification outside any plan; reported ONCE at milestone level) |
declared_modify_not_modified subtracts actual_create because a file declared as "modify" but actually created from scratch in this milestone is a NAMING mismatch, not a missing change — surface it as a WARN with hint "plan said modify; actual was create — was this file new this milestone?"
Add a new section to the gate report (Step 6):
## Plan vs Actual File Structure
Each prd2impl task with `source_plan_path` is broken down to its plan-tasks
(read from task-hints.yaml). The "Plan-Task" column shows `{prd2impl-task} /
plan-task-{N}` where N is the 1-based ordinal within the plan.
| Plan-Task | Status | Declared (C/M) | Actual (C/M) | Delta |
|-----------|--------|----------------|--------------|-------|
| T1 / plan-task-1 | ✅ | 5/0 | 5/0 | none |
| T1 / plan-task-2 | ❌ | 2/3 | 2/1 | declared_modify_not_modified: api_routes.py, auth.py |
| T1 / plan-task-4 | ⚠️ | 1/2 | 3/2 | unexpected_create: helpers/utils.py, helpers/__init__.py |
### Blocking deltas (NO-GO contributors)
- T1 / plan-task-2: declared but missing modify on `autoservice/api_routes.py`
- T1 / plan-task-2: declared but missing modify on `autoservice/auth.py`
### Warning deltas (CONDITIONAL GO contributors)
- T1 / plan-task-4: unexpected create `helpers/utils.py` — scope creep or incidental?
missing_create row → contributes a NO-GO to the milestone gate.declared_modify_not_modified row → contributes a NO-GO.unexpected_create and unexpected_modify rows are WARNINGs only — they contribute a CONDITIONAL GO if no NO-GO is otherwise present.If NO tasks in the milestone have source_plan_path, skip this step entirely (silent — no warning). The milestone may simply not be a plan-passthrough milestone.
If a task has source_plan_path but the file is missing, surface a CONDITIONAL GO with the diagnostic "plan file missing — cannot verify file structure" and proceed with Step 3.
Invoke dev-loop-skills:skill-4-test-runner and consume its e2e-report
artifact. Unlike raw pytest, the runner mechanically distinguishes new
failures from regression failures and emits an evidence manifest the
gate can read.
Run the test runner scoped to this milestone's phase keyword:
/test-runner --phase {phase_keyword} --emit-report
Read the resulting artifact at .artifacts/e2e-report-{milestone}-*.yaml.
Parse three signal classes from the report:
new_failure: count — failures in tests added during this milestoneregression_failure: count — failures in tests that previously passed
(auto-escalates to NO-GO regardless of other counts)pass_count, skip_countGate rule:
regression_failure > 0 → NO-GO (do NOT downgrade to "1 env-blocked")new_failure > 0 → STAGED (review with the user before declaring GO)Fall back to raw pytest with a logged warning. Without dev-loop, the gate cannot mechanically distinguish new vs regression failures — this is a structural weakness, not a stylistic preference.
Unit/Integration tests:
echo "WARN: dev-loop-skills not detected; smoke-test cannot distinguish"
echo " new vs regression failures. Install dev-loop-skills for"
echo " milestone-grade reporting."
pytest tests/ -k "{phase_keyword}" --tb=short
Contract tests (if applicable):
pytest tests/contract/ --tb=short
Type checks (if configured):
# Python
mypy autoservice/ --ignore-missing-imports
# TypeScript
npx tsc --noEmit
Build check:
make check # or equivalent
Treat any failure as ambiguous in the fallback path. Prompt the user to triage manually before declaring GO.
Check that all tasks in this phase have the expected artifacts:
## Artifact Completeness — M1
| Task | eval-doc | test-plan | test-diff | e2e-report |
|------|----------|-----------|-----------|------------|
| T1A.1 | ✅ eval-003 | ✅ plan-003 | ✅ diff-003 | ✅ e2e-003 |
| T1A.2 | ✅ eval-T1A.2 | ✅ plan-T1A.2 | ⚠️ missing | ⚠️ missing |
| T1B.1 | �� eval-004 | ✅ plan-003 | ✅ diff-003 | ✅ e2e-003 |
Yellow/Red tasks (no dev-loop): Check deliverable files exist
| T1A.4 | N/A | N/A | N/A | ✅ agents/*/soul.md |
Result: 15/17 complete artifacts — WARN (2 missing)
Execute or guide E2E scenarios from the kickoff doc:
For each scenario:
### Scenario: Customer sends first message
Steps:
1. Start web server: make run-web
2. Open browser to localhost:8000
3. Send a test message in the chat widget
4. Verify: Message appears, AI response within 5s
5. Verify: Conversation ID assigned
Result: [ ] Auto-testable [x] Manual verification needed
Categorize each scenario:
If Step 3's e2e-report listed any regression_failure rows, copy
each row into the gate report's ## Blocking failures section
verbatim. Do NOT downgrade these to "1 env-blocked, structurally
identical to verified counterpart" — that footnote pattern is what
the design spec §1 explicitly forbids. A regression failure in the
e2e-report means a previously-passing test is now red; that is a
NO-GO regardless of how the new tests perform.
# Milestone M1 Gate Report — {date}
## Summary
| Check | Result | Details |
|-------|--------|---------|
| Task completion | ✅ PASS | 17/17 tasks completed |
| Plan vs Actual files | ✅ PASS | (0.4.1+) 0 missing_create, 0 declared_not_modified, 1 unexpected_create (WARN) |
| Automated tests | ✅ PASS | 42 tests, 0 failures |
| Contract tests | ✅ PASS | 109 cases, all green |
| Artifact completeness | ⚠️ WARN | 2 artifacts missing (non-critical) |
| Build check | ✅ PASS | make check successful |
| Smoke scenarios | 🔵 PARTIAL | 3/5 auto-verified, 2 need manual |
## Overall: ✅ GO (with 2 manual verifications pending)
## Manual Verification Checklist
- [ ] Scenario 3: Browser chat widget renders correctly
- [ ] Scenario 5: Reconnection after network drop
## Recommended Actions
1. Complete manual verifications
2. If all pass → merge to integration branch
3. Run: /retro M1 for retrospective
4. Proceed to M2: /next-task
Based on the report:
GO (all critical checks pass):
NO-GO (critical failures):
CONDITIONAL GO (warnings only):
Before declaring GO, apply the following additional layers when the respective skills are available (non-blocking — skip any layer whose skill is unavailable):
superpowers:requesting-code-review. This
dispatches the code-reviewer subagent to audit the milestone's merged
changes against the plan and coding standards. Process its feedback via
superpowers:receiving-code-review (rigorous verification, not blind
agreement). Rationale: Green-task closures inside /continue-task already
run a per-task review; the milestone-level review catches cross-task
integration issues that per-task review cannot see.superpowers:verification-before-completion as a final check before
declaring GO. It enforces "evidence before assertions" — every ✅ in the
gate report must be backed by an observed command output.All three layers are advisory: if none are available, the gate decision falls back to the automated-test / artifact / scenario checks above. If they flag critical issues, downgrade the gate decision from GO to CONDITIONAL GO (or NO-GO) regardless of automated-test results.
───────────────────────────────────────────────────── ⬆ /smoke-test complete ─────────────────────────────────────────────────────
📋 Next: /retro {M} — milestone retrospective analysis /next-task — continue with next milestone tasks ─────────────────────────────────────────────────────