Help us improve
Share bugs, ideas, or general feedback.
From backlogd
backlogd's Sprint Retrospective mechanism — over a completed milestone (or a cycle / date / count fallback) the retro reads the execution graph as objective evidence, detects cross-issue patterns no single review can see, classifies each learning, and files the load-bearing ones as candidate `kind:improvement` issues for the PO to prioritize. The retro proposes; the PO prioritizes. Use when implementing or modifying `/backlogd:retro`, or any caller that runs the read → detect → classify → file pipeline.
npx claudepluginhub nicolai-bernsen/backlogd --plugin backlogdHow this skill is triggered — by the user, by Claude, or both
Slash command
/backlogd:retroThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
backlogd runs on Scrum's three empirical pillars — **transparency, inspection,
Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.
Share bugs, ideas, or general feedback.
backlogd runs on Scrum's three empirical pillars — transparency, inspection,
adaptation. The first two are wired (Linear is transparent; the execution graph and the
independent verdict review are inspection). Adaptation is the thinnest pillar: without
a retrospective that acts on the inspection data, the execution graph is just numbers
nobody reads and the loop never closes. This skill is the operating contract behind
/backlogd:retro — the Sprint Retrospective, which converts inspection into adaptation by
reading what happened over a scope, identifying the load-bearing improvements, and filing
them.
Read this file before modifying
commands/retro.mdor any caller that runs the retro pipeline. The four properties below are load-bearing — break any one and the retro stops being an empirical loop and becomes vibes.
This is the Sprint Retrospective verbatim from the Scrum Guide: the Scrum Team inspects how the last Sprint went … identifies the most helpful changes to improve its effectiveness … the most impactful improvements are addressed as soon as possible; they may even be added to the Sprint Backlog. backlogd's reading: the scope is a milestone (not a fixed time-box), the evidence is the execution graph (not memory), and the improvements are filed as candidate issues the PO prioritizes.
backlogd has no time-box-for-sustainability need (an agent team does not burn out), so the natural retro boundary is scope, not the calendar. The trigger is therefore milestone-primary: completing a milestone — the PO's scope/direction marker, set at project/problem creation — is the conceptual trigger for a retrospective over that scope.
A long-running milestone could still go un-retro'd for a long time, so cycle-end is an optional cadence safety-net: a periodic look-back regardless of milestone state. And because milestones aren't routine in every workspace yet, the command is invocable on demand with explicit scope selectors so it can be dogfooded today:
| Invocation | Scope |
|---|---|
/backlogd:retro | The most-recent completed milestone (fallback: --last 10 if none). |
/backlogd:retro <milestone name> | That specific milestone. |
/backlogd:retro --cycle N | A time-boxed cycle window (the cadence safety-net). |
/backlogd:retro --since <ISO date> | All problems completed on/after that date. |
/backlogd:retro --last N | The last N completed problems. |
The milestone/cycle is the conceptual trigger; the command is the entry point. Both are real — a milestone closing is the natural prompt, and the on-demand selectors mean a retro is never blocked on milestones being set up.
The retro's primary evidence is the execution graph — the agent-execution metadata the loop records (rework, latency, blockers, partials). This is the property that makes the retro objective. A retrospective that asks "how do we feel it went" reintroduces the self-marking failure the independent reviewer exists to prevent — at the batch level. The graph is what actually happened, recorded as the loop ran, not a memory or a vibe.
Consume the existing reducer surface; never re-implement it. The evidence interface is:
python scripts/graph.py report --json
It exits 0 and emits these documented top-level keys (metrics() in scripts/graph.py):
| Key | What the retro reads from it |
|---|---|
dispatches | per-unit outcomes — total / solved / partial / blocked + partial_rate / blocked_rate. The coarse health of the work. |
rework | problem-level rework — events, problems_with_rework, rate. The single strongest "this was hard" signal: how often work came back from review. |
dispatch_to_pr_ms | dispatch→PR latency p50 / p90 — where the loop is slow. |
run_wall_time_ms | end-to-end wall time p50 / p90. |
by_area | per-area:*-label aggregates (dispatches / blocked / partial / rework) + by_area_note. The cross-issue lens — which area of work blocks or reworks most. |
The reducer degrades cleanly on an empty/sparse store — zero counts and None
percentiles, an empty by_area with an explanatory by_area_note — rather than raising.
So report --json is safe to read unconditionally. The graph-navigation skill
(skills/graph-navigation/) documents the full surface and the inline load_edges()
recipes for any slice the rolled-up report doesn't expose (the per-problem rework set,
the slowest dispatches by latency).
Why consume, not rebuild. The reducer is the single source of these metrics — the same surface
/backlogd:statusreads for its forecast. A second implementation in the retro would drift from it. The retro is a reader of the graph, exactly like thegraph-navigationskill; it never writes the graph and never re-derives the math.
A single /backlogd:review sees one problem. It catches the gap in that problem. What it
cannot see is repetition: that the same gap showed up in three problems this
milestone, which makes it a systemic gap worth a standard, not three one-off notes. The
retro is the batch-level complement to the reviewer's in-the-moment gap-detection — it
reads across the whole scope.
Patterns to look for, by reading the graph slice and the closed problems' comments together:
**[backlogd reviewer]** verdicts
flagged the same absent rule, or the same NEEDS-PO/UNMET theme. → a systemic gap → a
high-priority ADR / standard candidate (the batch signal NB-378's reviewer can't
raise alone).area — by_area shows one area:* label with a
conspicuous blocked / rework count relative to its dispatches. → either a missing
standard governing that area or a framework friction there.dispatch_to_pr_ms /
run_wall_time_ms. → a process candidate, if it repeats.A pattern is, by definition, repetition — "N problems hit X". A single problem's quirk is not a pattern; it is a one-off (property 4).
Each learning is classified into exactly one of three buckets — the same calibration discipline the reviewer applies to its verdict:
| Bucket | Trigger | Action |
|---|---|---|
| recurring failure | a systemic gap, ≥2 problems | file a candidate ADR / standard (kind:improvement) |
| process problem | the framework itself made the work harder | file a candidate framework problem / bug (kind:improvement) |
| one-off | a single problem's quirk, no repetition | note in the summary, do not file |
Two discipline rules hold the output honest:
kind:improvement queue. If a proposed improvement can't be
tied to repeated evidence (graph or cross-issue), it is a one-off — note it, don't file
it. Same instinct as "don't over-extend the reviewer until the verdict is noise".Each filed candidate is a normal Linear issue, created via the linear skill's key-free
official-MCP filing path (save_issue with no id → create — see
skills/linear/references/linear-mcp.md). Shape:
problem and kind:improvement. The problem label makes it pickup-able
by the normal loop (/backlogd:scope → /backlogd:solve); kind:improvement marks it as
retro-sourced self-improvement so the PO can filter the improvement backlog. Create the
kind:improvement label on first use — create_issue_label({ team, name: "kind:improvement" }) if list_issue_labels shows it missing (it does not exist in the
workspace yet). This ensure-first step is required, not cosmetic: save_issue does
not auto-create labels — an unknown name passed in save_issue.labels is silently
dropped (no error, no label), so the label must exist before it can be applied.## Acceptance Criteria: typed per skills/ac/ (prefer [review] for "is this
standard sound", [test] where a check is obvious; [manual] only for a fact no
fresh-context agent can observe). The retro is proposing the work, so the AC can be
thin — /backlogd:scope sharpens it when the PO prioritizes it./backlogd:scope's job once prioritized.The retro posts one summary comment so the inspection→adaptation step is durable and
visible in Linear (not just the terminal). Where it can dedupe, it is an idempotent upsert
keyed by a scope marker, like the project-health and Shipped-summary helpers in
skills/linear/references/documents-and-updates.md:
save_comment({ milestoneId, body })),
deduped by the scope marker (see below) — list_comments({ milestoneId }) lists the
thread, so a re-run updates in place.save_comment({ projectId, body })), deduped by the scope marker (see below) —
list_comments({ projectId }) lists the thread, so a re-run updates in place.
Verified live 2026-06-03 (ADR-008):
a probe list_comments({ projectId }) returned the prior retro summary by its
<!-- marker: retro:<scope> -->, so project-thread marker-dedupe works (the earlier
"issues-only" reading off the stale 2026-05-28 snapshot was itself an unverified assumption).Body shape (visible **[backlogd retro]** badge; Linear renders the HTML comment as
literal text):
**[backlogd retro]** Retrospective — <scope: milestone "X" | cycle N | since <date> | last N>
Problems in scope: <n> closed.
Graph signal: rework <r>% (<rw>/<p>), partial <pa>%, blocked <bl>%, dispatch→PR p50 <ms>.
(or: "Sparse graph — leaned on Linear evidence: <what>.")
Patterns detected
- <pattern> → <recurring failure → ADR | process problem → bug> → filed <NB-N>
- …
Noted (one-offs, not filed)
- <observation> (or "—")
Filed for prioritization: <NB-N>, <NB-M>, … (or "none — nothing load-bearing this scope")
<!-- marker: retro:<milestone-name | cycle-N | since-<date> | last-N> -->
The trailing <!-- marker: retro:<scope> --> is the dedupe key on both paths: on a
re-run over the same scope, list_comments({ milestoneId }) (milestone scope) or
list_comments({ projectId }) (no-milestone scope) → filter to bodies starting
**[backlogd retro]** → match the marker → capture the comment id →
save_comment({ id, body }) to update in place. Never post a second summary for the same
scope. The projectId listing is verified live 2026-06-03 (see the no-milestone bullet
above).
On a fresh checkout the graph store is gitignored and may be absent or thin. report --json still exits 0 with zero counts — that is not an error. When the graph is
sparse, the retro:
blocked-labelled issues in scope, and the **[backlogd developer]** /
**[backlogd reviewer]** comments on the closed problems;None percentile is reported as "—" / "insufficient
data", never invented.A retro over a scope with real graph data is a stronger signal than one over a sparse
store — exactly as a verdict backed by [test] checks beats one backed by [review]
alone. The retro degrades gracefully; it does not pretend.
milestone closes ──┐ ┌── /backlogd:retro (on-demand, any scope)
(conceptual trigger)│ │
↓ ↓
read graph (report --json) + closed problems in scope
↓
detect cross-issue patterns → classify (recurring | process | one-off)
↓
file load-bearing candidates (problem + kind:improvement) + post retro summary
↓
PO prioritizes → /backlogd:scope shapes → /backlogd:solve executes
The retro is the only place backlogd reads across a scope of problems to adapt the
framework itself. It is the batch complement to /backlogd:review (one problem, in the
moment) and a reader of the same graph /backlogd:status reads for its forecast.
scope → solve) works them. The retro never
auto-fixes — that blows the propose/prioritize split, the batch-level twin of the
reviewer's judge/act split.scripts/graph.py report --json. If a
metric the retro wants isn't in the reducer, that is a gap to fix in scripts/graph.py
(a kind:improvement candidate in its own right), not a calculation to duplicate here.report --json first; cite the metric.metrics() in the retro → drifts from the source /backlogd:status
reads. ✅ Consume scripts/graph.py report --json; never re-derive.kind:improvement queue, buries
the signal. ✅ Load-bearing only; tie each to repeated evidence.report --json
degrades to zeros; lean on Linear evidence and say so.retro:<scope> marker — list_comments({ milestoneId })
on the milestone path, list_comments({ projectId }) on the no-milestone path (the
projectId listing is verified live 2026-06-03, ADR-008).