Help us improve
Share bugs, ideas, or general feedback.
From prd2impl
Milestone retrospective analysis — analyze cycle times, blockers, failure patterns, and generate improvement suggestions. Use when the user says 'retrospective', 'retro for M1', 'what went wrong', or runs /retro.
npx claudepluginhub ezagent42/prd2impl --plugin prd2implHow this skill is triggered — by the user, by Claude, or both
Slash command
/prd2impl:skill-11-retroThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<SUBAGENT-STOP>
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Analyze a completed milestone's execution data to identify patterns, bottlenecks, and improvement opportunities.
/retro {milestone} (e.g., /retro M1){plans_dir}/tasks.yaml or {plans_dir}/task-status.md.artifacts/registry.json (artifact creation times){plans_dir}/*-execution-plan.yaml (planned timeline)Path resolution: Before constructing any output path, resolve
{plans_dir}perlib/plans-dir-resolver.md. Alldocs/plans/references below (exceptdocs/plans/project.yaml, which stays at repo root) are relative to that resolved directory..artifacts/paths are NOT scoped — they remain shared across plans_dir (see design spec §8 Limitation 1).
Task data: For each task in this milestone's phase, collect:
→ in_progress commit)→ completed commit)Git data:
git log --oneline --since="{milestone_start}" --until="{milestone_end}" --grep="task:"
Artifact data: Creation timestamps from registry.json
metrics:
milestone: M1
planned_duration: "4 hours"
actual_duration: "5.5 hours"
slippage: "+37%"
tasks:
total: 17
completed_on_time: 14
completed_late: 2
blocked_during: 1
failed_and_redone: 0
cycle_times:
median: "45 min"
mean: "52 min"
p90: "2h 10min"
fastest: "T1A.7 — 15 min"
slowest: "T1A.4 — 3h 20min (Yellow, review wait)"
by_type:
green:
count: 13
median_cycle: "35 min"
total_time: "7h 30min"
yellow:
count: 3
median_cycle: "1h 45min"
total_time: "5h 15min"
review_wait_avg: "40 min"
red:
count: 1
median_cycle: "2h"
decision_wait: "30 min"
blockers:
count: 1
total_blocked_time: "45 min"
reasons:
- task: T1A.5
reason: "Waiting for T1A.4 review"
duration: "45 min"
Identify recurring patterns:
Bottlenecks:
Velocity trends:
Quality signals:
How many tasks had to be redone?
How many had test failures?
How many contracts needed amendment?
Tasks shipped without executed test-plan (0.4.0+, dev-loop required):
/artifact-registry query --status-not executed --linked-task-status done
Each row is a coverage gap — a task that shipped without observed
test execution. Surface in retro report as
## Coverage gaps (tasks done without executed test-plan): N (list).
This is the metric the AutoService team has been re-discovering
manually after every milestone (PV2 had 11+ post-gate fixes for
bugs that never had a real test).
Graceful degradation: when dev-loop missing, skip this signal with a note in the retro report.
# Retrospective: Milestone M1 — {date}
## Timeline
Planned: Wed 04-15 PM (4 hours)
Actual: Wed 04-15 13:00 - 18:30 (5.5 hours, +37%)
## What Went Well
- All 13 Green tasks completed efficiently (median 35 min)
- No tasks failed and needed redo
- All lines worked in parallel without conflicts
- Contract tests caught 2 issues early
## What Didn't Go Well
- Yellow tasks took 3x longer than Green (review wait bottleneck)
- T1A.4 (soul.md) took 3h 20min — largest single task
- Milestone slipped by 1.5 hours from plan
## Metrics Summary
| Metric | Value |
|--------|-------|
| Tasks completed | 17/17 |
| Median cycle time | 45 min |
| Slowest task | T1A.4 (3h 20min) |
| Blocked time | 45 min total |
| Review wait (avg) | 40 min |
## Root Causes
1. **Yellow review bottleneck**: Reviews took 40 min avg because reviewer
was busy with own tasks. Consider: dedicated review windows.
2. **T1A.4 scope**: soul.md for 4 roles was too large for one task.
Should have been split into 4 tasks.
## Improvement Suggestions
1. **Split large Yellow tasks**: Any task >2h should be subdivided
2. **Review windows**: Schedule 15-min review blocks between batches
3. **Parallel reviews**: Both devs review each other's Yellow tasks
simultaneously instead of sequentially
4. **Red task preemption**: Start Red tasks earlier (this worked well
for T3A.4-6, apply same pattern)
## Carry-Forward Items
- [ ] Apply task splitting rule to M2 Yellow tasks
- [ ] Schedule review windows in M2 kickoff
{plans_dir}/retro-{milestone}-{date}.mdretro: M1 retrospectiveWhy this exists: M3 retro produced 13 numbered improvement recommendations (R1–R13) in
docs/plans/m3/prd2impl-retro-notes.md. Most never propagated into prd2impl skill templates. PV2 reproduced nearly identical failure modes a sprint later. Step 6 closes the dead-end-report problem by turning each suggestion into a concrete skill patch.
improvement_suggestions: block from Step 4 / Step 5 outputskills/*/SKILL.md in this pluginClassify each suggestion by target skill. Use these heuristics:
| Suggestion shape | Target skill |
|---|---|
| "yellow review missed contract X" | skill-13-autorun/SKILL.md yellow checklist |
| "task generated for tombstoned story" | skills/using-prd2impl/SKILL.md tombstone gate |
| "test passed but missed prod bug" | references/mock-policy.md or skill-3-task-gen/SKILL.md connector_seam |
| "dead code shipped per spec" | skill-13-autorun/SKILL.md two-stage yellow review |
| "subagent invented an API method name" | skill-12-contract-check/SKILL.md --preflight wiring |
| "estimate was N× off" | skill-3-task-gen/SKILL.md similarity_hint guidance |
| "operational default differs from code default" | skill-3-task-gen/SKILL.md env_var.class declaration rule |
For suggestions that don't match any heuristic, surface them in a
## Unclassified section of the patch directory's index. These
need maintainer judgment before they can become skill rules.
For each classified suggestion, derive:
pool.acquire_for_session
which does not exist on real CCPool; current skill-13 review
approves it; expected after patch: review fails")Invoke superpowers:writing-skills with the baseline scenario,
proposed rule, and target file. The writing-skills skill
pressure-tests the rule against the baseline:
Emit one patch per suggestion under
{plans_dir}/framework-patches/{slug}.md using
templates/framework-patch.md format.
{plans_dir}/framework-patches/ directory containing N patches, each
ready for human review or auto-apply by a maintainer. Auto-apply is
out of scope for 0.4.0 — patches are committed artifacts the
maintainer copies into the prd2impl repo as a separate PR.
If superpowers:writing-skills is not installed, retro emits the
markdown patch without the pressure-test step. The patch file
documents that pressure testing was skipped — maintainer must
manually verify the rule catches the baseline before merging into
the skill.
task: T1A.1 → in_progress/completed)───────────────────────────────────────────────────── ⬆ /retro complete ─────────────────────────────────────────────────────
📋 Next: /plan-schedule — plan next milestone /start-task {ID} — address retrospective findings ─────────────────────────────────────────────────────