From cli-printing-press
Runs retrospective on Printing Press CLI generations to identify systemic improvements in templates, Go binary, skills, and catalog. Creates GitHub issues for actionable fixes. Use after /printing-press runs.
npx claudepluginhub mvanhorn/cli-printing-press --plugin cli-printing-pressThis skill is limited to using the following tools:
Analyze a Printing Press session to find ways to improve the system that produces
Polishes generated Go CLIs for publishing: runs diagnostics (dogfood, verify, scorecard, go vet), auto-fixes issues (dead code, descriptions, README, MCP quality), reports deltas, offers publish. Use after /printing-press.
Reviews completed coding sessions to extract actionable improvements: DX friction, documentation gaps, architecture issues, anti-patterns, bug prevention, and tooling updates.
Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.
Share bugs, ideas, or general feedback.
Analyze a Printing Press session to find ways to improve the system that produces CLIs — the Go binary, templates, skills, and catalog. Not fixes to the specific CLI that was just printed, but improvements so the next CLI comes out stronger.
It is a non-goal for the Printing Press to produce flawless CLIs without manual tweaks. That's the nature of the system. We expect agents to reason over the generated CLI, customize for the specific API, build novel features, and iterate. Some hand-built work in every run is normal.
The retro's job is to find the subset of manual work where the machine could have realistically raised the floor — given the agent a better starting point, prevented the issue entirely, or eliminated friction that would recur on the next CLI. Two clear cases qualify:
Otherwise, the manual work is normal iteration and should not generate a finding. Some items make it back as machine fixes; not all. The retro is the filter that distinguishes the two.
The retro creates a GitHub issue on the printing-press repo with the findings that survive triage and the adversarial check, plus artifacts, so maintainers (or an AI agent) can fix the Printing Press.
internal/generator/)cmd/printing-press/)notion-pp-cli). Printed-CLI fixes only help that one CLI.Use "the Printing Press" when talking about the system. Use the subsystem name when pointing a developer at what to fix — "fix the scorer" and "fix the generator" are different PRs.
# Path-only setup — no binary detection required.
# The retro skill reads manuscripts and runs gh/curl. It does not invoke the
# printing-press binary. This avoids aborting for users who installed the
# plugin but not the Go binary.
_scope_dir="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
_scope_dir="$(cd "$_scope_dir" && pwd -P)"
PRESS_HOME="$HOME/printing-press"
PRESS_MANUSCRIPTS="$PRESS_HOME/manuscripts"
PRESS_LIBRARY="$PRESS_HOME/library"
RETRO_SCRATCH_DIR="/tmp/printing-press/retro"
mkdir -p "$PRESS_MANUSCRIPTS" "$PRESS_LIBRARY" "$RETRO_SCRATCH_DIR"
# Detect whether we're inside the printing-press repo
IN_REPO=false
if [ -f "$_scope_dir/cmd/printing-press/main.go" ]; then
IN_REPO=true
REPO_ROOT="$_scope_dir"
echo "Running from printing-press repo: $REPO_ROOT"
fi
if [ ! -d "$PRESS_MANUSCRIPTS" ] || [ -z "$(ls -A "$PRESS_MANUSCRIPTS" 2>/dev/null)" ]; then
echo "No manuscripts found. Run /printing-press first to generate a CLI."
exit 1
fi
If the user passed an API name as an argument, use that. Validate for path traversal:
# Reject names with /, \, or ..
if echo "$USER_API_NAME" | grep -qE '[/\\]|\.\.'; then
echo "Invalid API name: '$USER_API_NAME'. Names cannot contain path separators or '..'."
exit 1
fi
# Verify resolved path stays under PRESS_MANUSCRIPTS
RESOLVED="$(cd "$PRESS_MANUSCRIPTS/$USER_API_NAME" 2>/dev/null && pwd -P)"
case "$RESOLVED" in
"$PRESS_MANUSCRIPTS"/*) ;; # OK
*) echo "Invalid API name: path resolves outside manuscripts directory."; exit 1 ;;
esac
If no API name was provided and multiple APIs exist, list them with their most recent run dates and ask the user to choose:
echo "Multiple APIs found in manuscripts:"
for api_dir in "$PRESS_MANUSCRIPTS"/*/; do
api_name=$(basename "$api_dir")
latest=$(ls -t "$api_dir" 2>/dev/null | head -1)
echo " - $api_name (latest run: $latest)"
done
Use AskUserQuestion to let the user pick.
If the API has multiple runs, default to the most recent. If the user specified a run ID, use that. Otherwise:
API_DIR="$PRESS_MANUSCRIPTS/$API_NAME"
RUN_ID=$(ls -t "$API_DIR" 2>/dev/null | head -1)
RUN_DIR="$API_DIR/$RUN_ID"
echo "Retro for: $API_NAME (run $RUN_ID)"
echo "Manuscripts: $RUN_DIR"
API_SLUG="$API_NAME"
CLI_NAME="${API_SLUG}-pp-cli"
CLI_DIR="$PRESS_LIBRARY/$CLI_NAME"
if [ ! -d "$CLI_DIR" ]; then
# Try without -pp-cli suffix (legacy naming)
CLI_DIR="$PRESS_LIBRARY/$API_NAME"
fi
if [ ! -d "$CLI_DIR" ]; then
echo "WARNING: CLI directory not found at $PRESS_LIBRARY/$CLI_NAME"
echo "Proceeding with manuscripts only — CLI source will not be included in artifacts."
CLI_DIR=""
fi
Best results come from running in the same conversation where the CLI was generated (post-shipcheck) — the retro can mine the full conversation history for errors, retries, manual edits, and discoveries.
If running in a fresh conversation, the retro proceeds with manuscript evidence only. Phase 2 marks session-dependent findings as "evidence: manuscripts only."
Read all artifacts from the run:
$RUN_DIR/research/*brief*$RUN_DIR/research/*absorb*$RUN_DIR/proofs/*shipcheck*$RUN_DIR/proofs/*build-log* (if exists)$RUN_DIR/proofs/*live-smoke* (if exists)$CLI_DIR/ (if available)Also gather the scorecard, verify pass rate, and dogfood report (from the shipcheck
proof or by re-running the tools if IN_REPO is true and the binary is available).
Scan the conversation history for six categories of signal and produce a candidate list. The candidate list is not the finding list — Phase 2.5 triage will cull it and Phase 3 will further drop weak survivors. Most candidates will not survive.
While collecting, distinguish:
If running in a fresh conversation without generation history: Note this and proceed with manuscript evidence only. Focus on what the manuscripts reveal — scorecard gaps, verify failures, dogfood issues, and obvious template patterns in the CLI source. Mark session-dependent findings as "evidence: manuscripts only."
Any time a command failed and was re-run, a build broke, or the Printing Press produced code that didn't compile. What broke and what fixed it?
Manual edits during iteration are normal — agents reason over the generated CLI and tweak. A single edit to handle this CLI's quirk is the workflow.
For each manual edit, ask: could the machine have raised the floor here?
The triage question is whether the machine raising the floor would compound across future CLIs — not whether this one CLI would have shipped a few lines lighter.
Hand-built features (transcendence commands, novel commands, helper packages for secondary APIs) are part of the workflow — agents build the domain-specific value layer on top of the API surface the machine emits. Building features by hand is not by itself a finding.
For each hand-built feature, ask: could the machine have raised the floor for this kind of feature?
summary
aggregation that the machine could scaffold from the spec.) Candidate, if
generalizable across multiple named APIs.The "raises the floor" test separates "machine fix" from "SKILL recipe": if the machine's contribution would still leave significant per-CLI work, the recipe belongs in the SKILL so the next agent knows the pattern; if the machine could absorb the boilerplate cleanly, it's a generator template.
Work that happens on every generation, not just this one. For each: is this inherent to the approach, or can the Printing Press eliminate it?
Propose at least two possible fixes at different levels (generator templates, binary post-processing, skill instruction) and assess which is most durable.
Improvements noticed during the session — UX ideas, performance improvements, new command patterns, output format improvements. Could this optimization be detected automatically and applied by the Printing Press?
Before proposing Printing Press fixes to improve scores, check whether the scoring itself is correct. Changing the Printing Press to satisfy a broken scorer is worse than doing nothing.
For each score penalty from dogfood, verify, and scorecard:
Common scorer bugs: name derivation mismatches, grep-based detection missing patterns, file exclusions too broad, section-counting heuristics.
The scorer audit is not optional. Every finding from a score penalty must have a "Scorer correct?" assessment before proposing a fix direction.
Only runs when the briefing named 2+ sources. Check $RUN_DIR/source-priority.json
(from the Multi-Source Priority Gate in the main skill). If it doesn't exist but the
briefing or user command clearly listed multiple services, that's itself a finding:
the priority gate didn't fire when it should have.
For runs with a source-priority.json, cross-reference it against the absorb manifest
and the shipped CLI:
--help. If a
secondary is the first thing the user sees, flag it.Each of these is a skill instruction gap category finding. The durable fix lives
in skills/printing-press/SKILL.md (the Multi-Source Priority Gate, the Priority
inversion check before Phase Gate 1.5, and the brief's ## Source Priority section)
or in the generator if README ordering is template-driven.
Before Phase 3 spends deep analysis on each candidate, run a fast triage to drop candidates that don't justify the deeper look. Most candidates should die here. The retro is a filter, not a funnel — if everything from Phase 2 makes it to Phase 3 unchanged, triage isn't doing its job.
For each candidate, ask in order:
~/printing-press/library/<api>/
and helps only this one CLI. If the proposed change is "edit this command in
this CLI" or "regenerate after fixing the spec," it's not a retro finding — it's
a polish pass on that CLI. Drop.grep -l "<finding keywords>" ~/printing-press/manuscripts/*/proofs/*-retro-*.mdSurvivors of these five questions go to Phase 3. Dropped candidates are recorded as one-line entries in the retro's "Dropped at triage" section — they exist for your own discipline check and for the maintainer to see triage actually ran.
Anti-pattern to avoid. A recent Pagliacci retro produced "Skip: None. Every
finding warrants action." That sentence is the failure mode this triage exists
to prevent. Two of those findings (snake_case in Use:, root.go Short: rewrite
that the SKILL already documents as a manual step) were classic per-CLI / instructional
candidates that should have been dropped here. If you find yourself writing
"every finding warrants action," stop and re-run triage.
For each candidate that survived Phase 2.5 triage, answer these seven questions. Question 5 has seven sub-steps (A through G); Step G is the adversarial check. Findings that fail Step G drop out — they don't get a priority, they don't go in the Do/Skip tables, they go on the dropped-candidates list with the reason.
1. What happened? One sentence — the symptom, not the fix.
2. Is the scorer correct? (mandatory for score-penalty findings)
3. What category?
| Category | Description |
|---|---|
| Bug | Generated code is wrong |
| Scorer bug | Scoring tool reports a false positive |
| Template gap | No template for a common pattern |
| Assumption mismatch | Printing Press assumes X but API uses Y |
| Recurring friction | Happens every generation, might be inherent |
| Missing scaffolding | Feature class the Printing Press could emit but doesn't |
| Default gap | Printing Press emits a wrong or placeholder default |
| Discovered optimization | Improvement found during use |
| Skill instruction gap | Skill told Claude wrong thing or missed a step |
4. Where in the Printing Press does this originate?
Pick exactly one component. The slug column drives the comp:<slug> label
applied to the issue when filed (Phase 6), which is how agents filter related
work across retros (gh issue list --label comp:<slug>).
| Component | Slug | Path |
|---|---|---|
| Generator templates | generator | internal/generator/ |
| Spec parser | spec-parser | internal/spec/ |
| OpenAPI parser | openapi-parser | internal/openapi/ |
| Catalog | catalog | catalog/ |
| Main skill | skill | skills/printing-press/SKILL.md |
| Verify/dogfood/scorecard | scorer | CLI commands |
If a finding genuinely spans two components, pick the one where the durable fix lands. Don't multi-label.
5. Blast radius and fallback cost — should the Printing Press handle this?
Step A: Cross-API stress test. Test across API shapes (standard REST, proxy-envelope, RPC-style) and input methods (OpenAPI, crowd-sniffed, HAR-sniffed, no spec).
Step B: Name three concrete APIs from the catalog with direct evidence. Not "every
API with multi-word resources" or "any browser-sniffed CLI." Name three specific APIs
already in ~/printing-press/library/ (or the embedded catalog/ directory) where you
can point to evidence the pattern exists: a path in their spec, a known endpoint shape,
a header the vendor documents, an output you can reproduce. "Stripe, Notion, GitHub
probably have this" is hand-waving; "Stripe (Stripe-Version header in spec line N),
GitHub (X-GitHub-Api-Version on the issues endpoints), Linear (api-version on /v2/*)"
is evidence. If you can name only two with evidence — or three with hand-waving — the
finding drops to P3 max with a subclass:<name> annotation, or moves to Drop.
Step C: Counter-check question. Ask explicitly: "If I implemented this fix, would it
actively hurt any API that doesn't have this pattern?" If yes, the fix needs a guard or
condition before being P1/P2 — not a default change. Example: turning on client-side
?limit=N truncation by default would hurt APIs that need server-side pagination for
correctness; it stays P2 only because it's gated on profiler-detected absence of a
paginator. Without that guard the same finding is unsafe to land.
Step D: Recurrence-cost check. Search prior retros under
~/printing-press/manuscripts/*/proofs/*-retro-*.md for the same finding. If the same
finding has been raised in 2+ prior retros without being implemented, the prior cost-
benefit math has been "no" twice. Don't re-raise it at the same priority — either move
to P3 with a "raised N times, still not justified" annotation, or reframe the finding
into a smaller incremental fix that addresses part of the friction. Recurrence at the
same priority is a triage failure, not stronger evidence.
Capture matched prior retros. When the search returns hits, record each as a structured tuple — retro CLI name, retro file path (or GitHub issue number if the retro file's frontmatter contains one), and a one-word classification:
aligned — the prior retro proposed the same fix direction. Strengthens the case;
reference it in Step F.contradicts — the prior retro proposed an opposing fix or chose a different
default. Surface this explicitly: a maintainer reading the new finding must see
the disagreement. State in one sentence why this retro reaches a different
conclusion (e.g., "prior retro saw single-paginator APIs; this one saw an
always-paginated API where the prior default would break").extends — the prior retro raised an adjacent finding in the same component
area but a different specific fix. Useful context, doesn't change the case.These tuples flow forward into the per-finding template ("Related prior retros")
in the retro doc and merge into the issue body's "Related issues" block alongside
the Step 2.5 dedup scan's related-area outputs. GitHub auto-cross-links any #N
issue number you write, so contradictions and alignments show up in both retro
timelines without further action.
Step E: Assess fallback cost. How reliably will Claude catch and fix this across every future API? A "simple" edit Claude forgets 30% of the time means 30% ship with the defect.
Step F: Make the tradeoff. Default is don't change the machine. The burden of proof is on the finding to justify a machine change. Continue to Step G only when all three of these are true:
(a) Step B named three concrete APIs with evidence (not speculation). (b) Step D's recurrence-cost check didn't disqualify the finding. (c) Step C's counter-check didn't surface a hurts-other-APIs concern that lacks a guard.
If a finding can't clear all three, it doesn't get a priority — it goes to Drop with the specific reason ("only named 2 APIs with evidence" / "raised 3 times, still not justified" / "fix would hurt single-paginator APIs without a guard").
When the finding applies to an API subclass, include: Condition (when to activate), Guard (when to skip), Frequency estimate.
Step G: Construct the case against filing. Before recording the finding, write 1-2 sentences arguing the opposite — what makes this look like a printed-CLI fix, an iteration artifact, or a wishlist item. Why might a maintainer close this as "works as designed" or "too narrow for a machine fix"? What's the strongest version of "this shouldn't be filed"?
If the case-against is stronger than the case-for, drop the finding. If they're roughly even, drop the finding (default direction is don't-file). Only when the case-for is clearly stronger does the finding survive to Phase 4.
This step is not a formality. It is the explicit place where weak findings die. A finding that survives Step G should be able to state, in one sentence, why the case-against fails — and that sentence is worth quoting in the retro entry.
6. Is this inherent or fixable? Push hard on whether smarter templates, a post-processing step, or better spec analysis could eliminate the friction. If inherent, propose the cheapest mitigation.
7. What is the durable fix? Prefer: template fix > binary post-processing > skill instruction.
Mark uncertainty explicitly. If you can't confidently isolate one root cause or one fix, say so — list the candidate causes (or candidate fixes) and how an implementer could disambiguate before committing. The issue body surfaces this uncertainty so the agent picking up the work doesn't lock in a wrong-but-plausible diagnosis. Confidence isn't a virtue when it's manufactured; an honest "either A or B; verify by X" is more useful than a wrong prescription.
Strip API-specific details from the proposed fix. The durable fix must work across
APIs, not just the one that surfaced the finding. If the fix includes hardcoded param
names (e.g., --sport, --league), date formats (e.g., YYYYMMDD), chunking
strategies (e.g., monthly), or domain-specific logic, those are printed-CLI details
leaking into the machine recommendation. The machine fix should be parameterized —
driven by what the profiler detects in the spec, not by what one API happens to need.
Example of the anti-pattern:
--dates for historical data"--dates with YYYYMMDD-YYYYMMDD format, --sport/--league flags, and monthly chunking to the sync template"--dates flag that passes the value through to the API"The bad fix bakes ESPN's date format, scope params, and chunking strategy into the machine. The good fix lets the profiler drive behavior from the spec.
Sort survivors of Phase 3 into three buckets:
No numerical scoring formulas. State the priority reasoning in words.
Sanity check before moving to Phase 5. Look at the bucket distribution. Almost every retro should have some drops and some skips. A retro with "all Do, no Skip, no Drop" is the failure mode — re-run triage and Step G on the weakest findings. Likewise, if every Do is P1, you're not prioritizing, you're inflating; force yourself to identify the weakest "Do" and ask whether it really beats the Skip bar.
The retro document is the durable audit trail — keep all fields below. The
GitHub issue body in Phase 6 will use a slim subset (action-shaped fields
only); the full triage rationale lives here, in the doc that gets uploaded as
an artifact and linked from the issue. See
references/issue-template.md for the issue-body shape.
Write the full retro document using this template:
# Printing Press Retro: <API name>
## Session Stats
- API: <name>
- Spec source: <catalog/browser-sniffed/docs/HAR>
- Scorecard: <score>/100 (<grade>)
- Verify pass rate: <X>%
- Fix loops: <N>
- Manual code edits: <N>
- Features built from scratch: <N>
## Findings
### 1. <Title> (<category>)
- **What happened:** ...
- **Scorer correct?** Yes / No / Partially. [details]
- **Root cause:** Component + what's specifically wrong
- **Cross-API check:** Would this recur?
- **Frequency:** every API / most / subclass:<name> / this API only
- **Fallback if the Printing Press doesn't fix it:** ...
- **Worth a Printing Press fix?** ...
- **Inherent or fixable:** ...
- **Durable fix:** ...
- **Test:** How to verify (positive + negative)
- **Evidence:** Session moment that surfaced this
- **Related prior retros:** *(from Phase 3 Step D; "None" if no matches)*
- `<api-slug>` retro #<issue-num-if-known> — `aligned` / `contradicts` / `extends`. <one-sentence note on what changed or what's shared>
- ...
## Prioritized Improvements
### P1 — High priority
| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |
|---------|-------|-----------|-----------|---------------------|------------|--------|
### P2 — Medium priority
| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |
|---------|-------|-----------|-----------|---------------------|------------|--------|
### P3 — Low priority
| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |
|---------|-------|-----------|-----------|---------------------|------------|--------|
*Omit empty priority sections.*
### Skip
| Finding | Title | Why it didn't make it (Step B / Step D / Step G) |
|---------|-------|--------------------------------------------------|
*Findings that survived Phase 2.5 triage but failed Phase 3 — name the specific
step that failed (e.g., "Step B: only 2 APIs with evidence" / "Step G: case-against
stronger; mostly per-CLI"). Empty if every Phase 3 candidate filed.*
### Dropped at triage
| Candidate | One-liner | Drop reason |
|-----------|-----------|-------------|
*Candidates rejected at Phase 2.5. One line each. Reasons: `iteration-noise` /
`printed-CLI` / `API-quirk` / `unproven-one-off` / `raised-N-times`. If this
section is empty, re-check Phase 2.5 — almost every retro has some.*
## Work Units
(see Phase 5.5)
## Anti-patterns
- ...
## What the Printing Press Got Right
- ...
Save the retro to manuscript proofs (always) and to the temp retro scratch
directory (always). Do not save retro documents under the source repo's
docs/retros/ directory; the skill must work the same way for users who do not
have the repo checked out, and retro documents are issue artifacts rather than
durable repo docs.
RETRO_STAMP="$(date +%Y%m%d-%H%M%S)"
RETRO_PROOF_PATH="$PRESS_MANUSCRIPTS/$API_NAME/$RUN_ID/proofs/$RETRO_STAMP-retro-$CLI_NAME.md"
RETRO_SCRATCH_DIR="/tmp/printing-press/retro"
RETRO_SCRATCH_PATH="$RETRO_SCRATCH_DIR/$RETRO_STAMP-$API_NAME-retro.md"
mkdir -p "$(dirname "$RETRO_PROOF_PATH")" "$RETRO_SCRATCH_DIR"
Write the full retro document to $RETRO_PROOF_PATH, then copy that file to
$RETRO_SCRATCH_PATH. This must complete before Phase 6 Step 1 copies the
manuscripts directory to staging.
Group related findings into coherent work units a planner could pick up directly.
For each "Do" finding or group of related findings:
### WU-1: <Title> (from F1, F3, ...)
- **Priority:** P1 / P2 / P3 *(max priority among absorbed findings — P1 if any
absorbed finding is P1, else P2 if any is P2, else P3)*
- **Component:** generator / openapi-parser / spec-parser / scorer / skill / catalog
*(must match one of the six fixed component slugs; drives the `comp:*` label
applied to the issue when filed)*
- **Goal:** One sentence describing the outcome
- **Target:** <component and area, e.g., "Generator templates in internal/generator/">
- **Acceptance criteria:**
- positive test: ...
- negative test: ...
- **Scope boundary:** What this does NOT include
- **Dependencies:** Other work units that must complete first
- **Complexity:** small / medium / large
The six fixed component slugs are: generator (internal/generator/),
openapi-parser (internal/openapi/), spec-parser (internal/spec/),
scorer (verify / dogfood / scorecard), skill (skills/printing-press/SKILL.md),
catalog (catalog/). If a WU genuinely spans two, pick the primary one — the
component where the durable fix will land. Pick exactly one; don't multi-label.
If running from inside the printing-press repo (IN_REPO=true):
Resolve target file paths using Glob and Grep tool invocations on $REPO_ROOT to
make work units more precise. E.g., use Glob to find internal/generator/*.go files,
Grep to find where sync code is generated.
If running externally (IN_REPO=false):
Describe target components by name (e.g., "Generator templates in internal/generator/")
and acceptance criteria without resolved file paths. The fixer will resolve paths when
they pick up the work.
After prioritization and work units are written, decide whether GitHub issues are warranted. Each WU becomes one flat top-level issue (no parent, no sub-issue hierarchy). The purpose of filing is to give someone (human or agent) something to fix in the Printing Press. If every finding is specific to this one printed CLI with nothing to change in the Printing Press, filing is noise — there's nothing to act on.
Skip filing if:
File issues (one per WU) if:
Use judgment. A retro that found three things but all three are "this API has a weird auth scheme no other API uses" is not worth filing. A retro that found one small template gap that would help every future CLI is worth filing.
If filing is skipped, still save the retro locally (manuscript proofs +
/tmp/printing-press/retro/), present the findings to the user, then jump
directly to Phase 6 Step 6 (present results — adjusted to show local-only paths).
Read and apply references/artifact-packaging.md through Step 4 only (create staging dir, copy, scrub, zip). Do not upload or clean up yet — the staging folder stays alive until the end of Phase 6.
The staging folder ($STAGING_DIR) now contains the scrubbed copies and the zips.
This is both the review target and the upload source.
This step only runs if the Phase 5.6 issue gate passed (there are Printing Press findings to act on).
Before showing the confirm prompt, run references/issue-template.md
Steps 1, 2, and 2.5 to ensure labels exist, sort the work units, and
compute the per-WU filing plan via the dedup scan against open
retro-tagged issues. Each WU ends up classified as either:
same match; the new evidence will be added as a comment instead of filing a duplicaterelated-area matches; the new issue's body will reference them via #N in the Related issues blockThe dedup scan does not need to be bulletproof. Bias toward "file new" when uncertain — duplicates are recoverable, miscomments on the wrong issue are uglier.
Then show the user a summary including the filing plan and ask for
confirmation via AskUserQuestion.
Ready to submit your retro.
Here's what will happen on mvanhorn/cli-printing-press:
Filing plan:
# Title Plan Notes 1 File new (P1, comp:) No match 2 Comment on #234 Matches "" 3 File new + reference #189 Adjacent open issue Each new issue carries
retro,priority:P<n>,comp:<slug>labels — agents filter related work across retros withgh issue list --label comp:<slug>orgh issue list --label priority:P1.Scrubbed artifact zips uploaded to catbox.moe and linked from each new issue:
- Retro document — full triage rationale, drops, skips, what went right
- Manuscripts () — research brief, shipcheck proof, build logs
- CLI source () — the generated Go code (no binary, no vendor/) (omit if not available)
Everything is staged at
<$STAGING_DIR>if you'd like to inspect the files first.
Options:
If the user picks "Let me review the files first," acknowledge and wait. When they come back, re-ask with Submit / Save locally only.
If the user picks "Save locally only," skip Steps 3 and 4 — the retro is already
saved to manuscript proofs and /tmp/printing-press/retro/. Clean up the staging
folder, then jump to Step 6.
If the user wants to override a dedup decision before submitting (e.g.,
"file new for WU-2 instead of commenting"), accept the override: clear
WU_DEDUP[i] for that WU and proceed.
Run artifact-packaging.md Step 5 (the catbox upload) using the zips already in
$STAGING_DIR. This produces $MANUSCRIPTS_URL and $CLI_SOURCE_URL.
Steps 1, 2, and 2.5 of references/issue-template.md
already ran during Step 2 (filing plan + confirm), so labels exist, WUs are
sorted, and $WU_DEDUP and $WU_RELATED are populated. This step runs
Step 3 of the reference: build bodies and execute the plan in parallel.
The "Execution principles" block at the top of issue-template.md is
mandatory: build issue bodies inline (heredocs into shell variables, not
the Write tool), run the whole step in one Bash invocation, and parallelize
the per-WU gh issue create / gh issue comment calls. Skipping these
costs real wall-clock latency — an N WU retro should finish in a single
round trip's worth of network time, not a serialized stack of them.
Each WU is independent: WUs marked comment:#N get a comment on the
existing issue; WUs marked file-new create a new flat top-level issue. No
parent, no sub-issue REST linking — every new issue stands alone in
GitHub's issue list with its own open/close lifecycle.
Each new issue carries its own priority:P<n> and comp:<slug> labels.
This is what enables gh issue list --label comp:openapi-parser to surface
every retro WU in that area across every retro — labels are the cross-retro
discovery surface, not auto-cross-links inside issue bodies.
Each new issue body's Related issues block combines:
related-area issue references from Step 2.5 (open issues in adjacent territory)Both reach across separate filed work where the #N auto-cross-link is
real signal. The body does not auto-cross-link to sibling WUs in the
same retro; that linkage is noise unless one is genuinely a prerequisite
(captured as free-text Dependencies: instead).
If gh is not authenticated or every per-WU action fails, follow the
graceful degradation path in the issue-template reference: save locally and
print manual filing instructions. Per-WU partial failures (some succeed,
some don't) are surfaced through $FAILED_ISSUES in Step 6.
Ensure the temp scratch copy exists. This is the human-friendly local path for reviewing or manually filing the retro when upload or issue creation fails.
if [ -f "$RETRO_PROOF_PATH" ]; then
mkdir -p "$RETRO_SCRATCH_DIR"
cp "$RETRO_PROOF_PATH" "$RETRO_SCRATCH_PATH"
fi
After issues are created and comments posted, show the user a summary in
priority order. Group created and commented outcomes — both are real
filed work, but the shape differs.
Retro submitted!
Filed new issue
, added commenton existing issues (P1 → P3 order):New issues:
- [P1] — <full $OUTCOME_URL[i]>
- [P2] <title> — <full $OUTCOME_URL[i]>
- ...
Comments on existing issues:
- [P1] → comment on #234 — <comment URL>
- ...
findings across work units. New issues are tagged with
comp:<slug>andpriority:P<n>labels — agents can filter related work across retros withgh issue list --label comp:<slug>orgh issue list --label priority:P1. (if artifacts uploaded) Artifacts: retro doc · manuscripts · CLI source Local copy: <$RETRO_SCRATCH_PATH>
The [P<n>] annotation here is presentation-only — the issue titles
themselves do not carry a priority prefix (priority lives on the label).
Showing it in the user-facing summary helps the user scan filed work in
priority order without opening each issue.
Omit either subsection (New issues: or Comments on existing issues:)
when empty. A retro that produced only comments (every WU matched an
existing open issue) is a good outcome — it means the issue tracker
already covered the findings and the new evidence reinforces them.
If $FAILED_ISSUES is non-empty (set by references/issue-template.md
Step 3), append a warning block before the closing line:
⚠️ Some actions need attention:
- — issue creation failed
- <title> — comment on #234 failed
- ...
File the missing issue(s) or comment(s) manually using the retro doc at <$RETRO_SCRATCH_PATH>.
If filing wasn't completed (user chose local-only, or gh failed entirely), show the local save paths and the manual filing instructions printed by the issue-template fallback path.
Run artifact-packaging.md Step 7 to delete $STAGING_DIR.