Skill

solo-retro

Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.

Git

Bash

devops

automation

npx claudepluginhub fortunto2/solo-factory --plugin solo

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/solo:retro

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Tool Access

This skill is limited to the following tools:

ReadGrepBashGlobWriteEditAskUserQuestionmcp__solograph__session_searchmcp__solograph__codegraph_explainmcp__solograph__codegraph_query

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.

Supporting Files

references/eval-dimensions.mdreferences/failure-catalog.md

SKILL.md

494 lines · ~4.9k tokens

Similar Skills

ADD Retro Command v0.5.0

Runs context-aware retrospectives auto-gathering git metrics, learnings, away-logs, and handoffs into pre-populated tables for interactive or agent-summary review.

8 tools

add

pipeline

Orchestrates an adversarial plan-implement-review pipeline by spawning agents with separate context windows. Use after intake skills produce a starting document.

20 files7 tools

forge

audit

Audits Claude Code session logs for PBR workflow compliance (STATE.md updates, hooks, commits, skills) and UX quality (flows, friction, expectations). Supports date ranges and modes.

7 tools

pbr

Stats

LanguageShell

Stars15

Forks2

MaintenanceExcellent

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

/retro

This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.

Post-pipeline retrospective. Parses pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.

Live Context

Branch: !git branch --show-current 2>/dev/null
Recent commits: !git log --oneline -10 2>/dev/null
Modified files: !git diff --name-only HEAD~5..HEAD 2>/dev/null | head -20

When to use

After a pipeline completes (or gets cancelled). This is the process quality check — /review checks code quality, /retro checks pipeline process quality.

Can also be used standalone on any project — with or without pipeline logs.

MCP Tools (use if available)

session_search(query) — find past pipeline runs and known issues
codegraph_explain(project) — understand project architecture context
codegraph_query(query) — query code graph for project metadata

If MCP tools are not available, fall back to Glob + Grep + Read.

Phase 1: Locate Artifacts

Detect project from $ARGUMENTS or CWD:
- If argument provided: use it as project name
- Otherwise: extract from CWD basename (e.g., ~/projects/my-app -> my-app)
Find pipeline state file: .solo/pipelines/solo-pipeline-{project}.local.md (project-local) or ~/.solo/pipelines/solo-pipeline-{project}.local.md (global fallback)
- If it exists: pipeline is still running or wasn't cleaned up — read YAML frontmatter for project_root:
- If not: pipeline completed — use CWD as project root
Verify artifacts exist (parallel reads):
- Pipeline log: {project_root}/.solo/pipelines/pipeline.log
- Iter logs: {project_root}/.solo/pipelines/iter-*.log
- Progress file: {project_root}/.solo/pipelines/progress.md
- Plan-done directory: {project_root}/docs/plan-done/
- Active plan: {project_root}/docs/plan/
Determine analysis mode:
- If pipeline log exists: proceed with full log-based analysis (Phases 2-4)
- If NO pipeline log: switch to fallback mode (see Fallback Analysis below)
Count iter logs (if they exist): ls {project_root}/.solo/pipelines/iter-*.log | wc -l
- Report: "Found {N} iteration logs"

Fallback Analysis (No Pipeline Logs)

If no pipeline logs exist, the retro can still provide value by analyzing:

Git history: git log --oneline --since="1 week ago" — commit frequency, patterns, conventional format
Test results: run test suite if configured in CLAUDE.md or package.json
Build status: run build if configured
CLAUDE.md changes: git log --oneline -- CLAUDE.md — how docs evolved
Code quality metrics: file counts, TODO/FIXME density, dead code indicators
Project structure: completeness of docs/, tests/, CI config

Skip Phases 2-4 and proceed directly to Phase 5 (Plan Fidelity) and Phase 6 (Git & Code Quality). Adjust Phase 7 scoring to weight available data more heavily.

Phase 2: Parse Pipeline Log (quantitative)

Read pipeline.log in full. Parse line-by-line, extracting structured data from log tags:

Log format: [HH:MM:SS] TAG | message

Extract by tag:

Tag	What to extract
`START`	Pipeline run boundary — count restarts (multiple START lines = restarts)
`STAGE`	`iter N/M \| stage S/T: {stage_id}` — iteration count per stage
`SIGNAL`	`<solo:done/>` or `<solo:redo/>` — which stages got completion signals
`INVOKE`	Skill invoked — extract skill name, check for wrong names
`ITER`	`commit: {sha} \| result: {stage complete\|continuing}` — per-iteration outcome
`CHECK`	`{stage} \| {path} -> FOUND\|NOT FOUND` — marker file checks
`FINISH`	`Duration: {N}m` — total duration per run
`MAXITER`	`Reached max iterations ({N})` — hit iteration ceiling
`QUEUE`	Plan cycling events (activating, archiving)
`CIRCUIT`	Circuit breaker triggered (if present)
`CWD`	Working directory changes
`CTRL`	Control signals (pause/stop/skip)

Compute metrics:

total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"

per_stage = {
  stage_id: {
    attempts: count of STAGE lines for this stage,
    successes: count of ITER lines with "stage complete" for this stage,
    waste_ratio: (attempts - successes) / attempts * 100,
  }
}

Phase 3: Parse Progress.md (qualitative)

Read progress.md and scan for error patterns:

Unknown skill errors: grep for Unknown skill: — extract which skill name was wrong
Empty iterations: iterations where "Last 5 lines" show only errors or session header (no actual work done)
Repeated errors: same error appearing in consecutive iterations -> spin-loop indicator
Doubled signals: <solo:done/><solo:done/> in same iteration -> minor noise (note but don't penalize)
Redo loops: count how many times build->review->redo->build cycles occurred

For each error pattern found, record:

Pattern name
First occurrence (iteration number)
Total occurrences
Consecutive streak (max)

Phase 4: Analyze Iter Logs (sample-based)

Do NOT read all iter logs — could be 60+. Use smart sampling:

First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
- Strip ANSI codes when reading: sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100
First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
- Look for <solo:done/> in the output
Final review iter: Read the last iter-*-review.log (the verdict)
Extract from each sampled log:
- Tools called (count of tool_use blocks)
- Errors encountered (grep for Error, error, Unknown, failed)
- Signal output (<solo:done/> or <solo:redo/> present?)
- First 5 and last 10 meaningful lines (skip blank lines)

Phase 5: Plan Fidelity Check

For each track directory in docs/plan-done/ and docs/plan/:

Read spec.md (if exists):
- Count acceptance criteria: total - [ ] and - [x] checkboxes
- Calculate: criteria_met = checked / total * 100
Read plan.md (if exists):
- Count tasks: total - [ ] and - [x] checkboxes
- Count phases (## headers)
- Check for SHA annotations ()
- Calculate: tasks_done = checked / total * 100
Compile per-track summary:
- Track ID, criteria met %, tasks done %, has SHAs

Phase 6: Git & Code Quality (lightweight)

Quick checks only — NOT a full /review:

Commit count and format:
```
git -C {project_root} log --oneline | wc -l
git -C {project_root} log --oneline | head -30
```
- Count commits with conventional format (feat:, fix:, chore:, test:, docs:, refactor:, build:, ci:, perf:)
- Calculate: conventional_pct = conventional / total * 100

Committer breakdown:

git -C {project_root} shortlog -sn --no-merges | head -10

Test status (if test command exists in CLAUDE.md or package.json):
- Run test suite, capture pass/fail count
- If no test command found, skip and note "no tests configured"
Build status (if build command exists):
- Run build, capture success/fail
- If no build command found, skip and note "no build configured"

Phase 6.5: Context Degradation Analysis

Check for signs of context window problems during the pipeline run:

Iteration quality curve: Compare early iterations vs late iterations.
- Did error rates increase over time? (sign of context degradation)
- Did the agent start repeating itself or losing track of the plan?
Observation masking usage: Check if scratch/ directory exists in project root.
- If yes: good — agent was offloading large outputs
- If no but iter logs show >100-line tool outputs: flag as waste source
Plan recitation evidence: In sampled iter logs, check if the agent re-read plan.md at task boundaries.
- Absent recitation + task drift = context engineering gap
CLAUDE.md bloat: wc -c {project_root}/CLAUDE.md
- 40,000 chars: WARN — attention dilution likely
- 60,000 chars: RED — severe context budget pressure

Add findings to the report under ## Context Health:

## Context Health
- Iteration quality trend: {STABLE / DEGRADING / N/A}
- Observation masking: {USED / NOT USED / N/A}
- Plan recitation: {OBSERVED / ABSENT / N/A}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}

Phase 7: Score & Report

Load scoring rubric from ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md. If plugin root not available, use the embedded weights:

Scoring weights:

Efficiency (waste %): 25%
Stability (restarts): 20%
Fidelity (criteria met): 20%
Quality (test pass rate): 15%
Commits (conventional %): 5%
Docs (plan staleness): 5%
Signals (clean signals): 5%
Speed (duration per track, normalized): 5%

Note: In fallback mode (no pipeline logs), redistribute Efficiency and Stability weights to Fidelity, Quality, and Commits.

Generate report at {project_root}/docs/retro/{date}-retro.md:

# Pipeline Retro: {project} ({date})

## Overall Score: {N}/10

## Pipeline Efficiency

| Metric | Value | Rating |
|--------|-------|--------|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |
| Tracks completed | {N} | |
| Duration per track | {time/tracks} | {emoji} |

## Per-Stage Breakdown

| Stage | Attempts | Successes | Waste % | Notes |
|-------|----------|-----------|---------|-------|
| scaffold | | | | |
| setup | | | | |
| plan | | | | |
| build | | | | |
| deploy | | | | |
| review | | | | |

## Failure Patterns

### Pattern 1: {name}
- **Occurrences:** {N} iterations
- **Root cause:** {analysis}
- **Wasted:** {N} iterations
- **Fix:** {concrete suggestion with file reference}

### Pattern 2: ...

## Plan Fidelity

| Track | Criteria Met | Tasks Done | SHAs | Rating |
|-------|-------------|------------|------|--------|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |

## Code Quality (Quick)

- **Tests:** {N} pass, {N} fail (or "not configured")
- **Build:** PASS / FAIL (or "not configured")
- **Commits:** {N} total, {pct}% conventional format

## Three-Axis Growth

| Axis | Score | Evidence |
|------|-------|----------|
| **Technical** (code, tools, architecture) | {0-10} | {what changed} |
| **Cognitive** (understanding, strategy, decisions) | {0-10} | {what improved} |
| **Process** (harness, skills, pipeline, docs) | {0-10} | {what evolved} |

If only one axis is served — note what's missing.

## Recommendations

1. **[CRITICAL]** {patch suggestion with file:line reference}
2. **[HIGH]** {improvement}
3. **[MEDIUM]** {optimization}
4. **[LOW]** {nice-to-have}

## Suggested Patches

### Patch 1: {file} — {description}

**What:** {one-line description}
**Why:** {root cause reference from Failure Patterns}

\```diff
- old line
+ new line
\```

Rating guide (use these emojis):

GREEN = excellent
YELLOW = acceptable
RED = needs attention

Phase 8: Interactive Patching

After generating the report:

Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
For each suggested patch (if any), use AskUserQuestion:
- Question: "Apply patch to {file}? {one-line description}"
- Options: "Apply" / "Skip" / "Show diff first"
If "Show diff first": display the full diff, then ask again (Apply / Skip)
If "Apply": use Edit tool to apply the change directly
After all patches processed:
- If any patches were applied: suggest committing with fix(retro): {description}
- Do NOT auto-commit — just suggest the command

Phase 9: CLAUDE.md Revision

After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.

Steps:

Read CLAUDE.md and check size: wc -c CLAUDE.md
Add learnings from this retro:
- Pipeline failure patterns worth remembering (avoid next time)
- New workflow rules or process improvements
- Updated commands or tooling changes
- Architecture decisions that emerged during the pipeline run
If over 40,000 characters — trim ruthlessly:
- Collapse completed phase/milestone histories into one line each
- Remove verbose explanations — keep terse, actionable notes
- Remove duplicate info (same thing explained in multiple sections)
- Remove historical migration notes, old debugging context
- Remove examples that are obvious from code or covered by skill/doc files
- Remove outdated troubleshooting for resolved issues
Verify result <= 40,000 characters — if still over, cut least actionable content
Write updated CLAUDE.md, update "Last updated" date

Priority (keep -> cut):

ALWAYS KEEP: Tech stack, directory structure, Do/Don't rules, common commands, architecture decisions
KEEP: Workflow instructions, troubleshooting for active issues, key file references
CONDENSE: Phase histories (one line each), detailed examples, tool/MCP listings
CUT FIRST: Historical notes, verbose explanations, duplicated content, resolved issues

Rules:

Never remove Do/Don't sections — critical guardrails
Preserve overall section structure and ordering
Every line must earn its place: "would a future agent need this to do their job?"
Commit the update: git add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"

Phase 10: Factory Critic (optional)

Run this phase only if ${CLAUDE_PLUGIN_ROOT} is available (i.e., solo-factory is installed). Skip if running as a standalone skill without the factory context.

After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.

What to evaluate:

Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
- For each skill: ${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.md
- Did the skill have the right instructions for this project's needs?
- Did it miss context it should have had?
Read pipeline script signal handling and stage logic:
- ${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.sh
- Were there structural issues (wrong stage order, missing re-exec, broken redo)?
Cross-reference with failure patterns from Phase 3:
- For each failure: was the root cause in the skill, the script, or the project?
- Skills that caused waste = factory defects

Score the factory (not the project):

Factory Score: {N}/10

Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}

Pipeline reliability: {N}/10 — {why}

Missing capabilities:
- {what the factory couldn't do that it should have}

Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}

Harness Evolution — think about the bigger picture

After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:

Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
- Missing docs -> add to docs/ or CLAUDE.md
- Stale docs -> flag for doc-gardening
- Knowledge only in your head -> encode it
Architectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
- Repeated boundary violations -> need a linter or structural test
- Inconsistent patterns -> need golden principle in CLAUDE.md
- Data shape errors -> need parse-at-boundary enforcement
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
- Good patterns -> capture as precedent in docs or CLAUDE.md
- Bad patterns -> encode as anti-pattern or lint rule
- Think: "if another agent hits this same problem tomorrow, what should it find?"
Skill gaps: Which skills need better instructions? Which new skills should exist?
- Skill that caused waste -> concrete SKILL.md patch
- Missing capability -> new skill idea for evolution log

Write to evolution log:

Append findings to {project_root}/docs/evolution.md (create if not exists). If ~/.solo/evolution.md exists, append there as well for cross-project tracking.

## {YYYY-MM-DD} | {project} | Factory Score: {N}/10

Pipeline: {stages run} | Iters: {total} | Waste: {pct}%

### Defects
- **{severity}** | {skill/script}: {description}
  - Fix: {concrete file:change}

### Harness Gaps
- **Context:** {what knowledge was missing or stale for the agent}
- **Constraints:** {what boundary violations or inconsistencies occurred}
- **Precedents:** {patterns worth capturing for future agents — good or bad}

### Missing
- {capability the factory lacked}

### What worked well
- {skill/pattern that performed efficiently}

Rules:

Be brutally honest — if a skill is broken, say so
Every defect must have a concrete fix (file + what to change)
Track what works well too — don't regress good patterns
Keep entries compact — this file accumulates over time

Signal Output

Output signal: <solo:done/>

Important: /retro always outputs <solo:done/> — it never needs redo. Even if pipeline was terrible, the retro itself always completes.

Edge Cases

No pipeline.log and no git history: abort with clear message — "No pipeline log or git history found. Nothing to analyze."
No pipeline.log but git history exists: switch to fallback mode (see Fallback Analysis)
Empty pipeline.log: report "Pipeline log is empty — was the pipeline cancelled before any iteration?"
No iter logs: skip Phase 4 sampling, note in report
No plan-done: skip Phase 5, note "No completed plans found"
No test/build commands: skip those checks in Phase 6, note in report
Pipeline still running: warn user — "State file exists, pipeline may still be running. Retro on partial data."

Reference Files

${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md — scoring rubric (8 axes, weights)
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md — known failure patterns and fixes

solo-retro

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

solo-retro

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

/retro

Live Context

When to use

MCP Tools (use if available)

Phase 1: Locate Artifacts

Fallback Analysis (No Pipeline Logs)

Phase 2: Parse Pipeline Log (quantitative)

Phase 3: Parse Progress.md (qualitative)

Phase 4: Analyze Iter Logs (sample-based)

Phase 5: Plan Fidelity Check

Phase 6: Git & Code Quality (lightweight)

Phase 6.5: Context Degradation Analysis

Phase 7: Score & Report

Phase 8: Interactive Patching

Phase 9: CLAUDE.md Revision

Steps:

Priority (keep -> cut):

Rules:

Phase 10: Factory Critic (optional)

What to evaluate:

Score the factory (not the project):

Harness Evolution — think about the bigger picture

Write to evolution log:

Signal Output

Edge Cases

Reference Files

Similar Skills

Help us improve

/retro

Live Context

When to use

MCP Tools (use if available)

Phase 1: Locate Artifacts

Fallback Analysis (No Pipeline Logs)

Phase 2: Parse Pipeline Log (quantitative)

Phase 3: Parse Progress.md (qualitative)

Phase 4: Analyze Iter Logs (sample-based)

Phase 5: Plan Fidelity Check

Phase 6: Git & Code Quality (lightweight)

Phase 6.5: Context Degradation Analysis

Phase 7: Score & Report

Phase 8: Interactive Patching

Phase 9: CLAUDE.md Revision

Steps:

Priority (keep -> cut):

Rules:

Phase 10: Factory Critic (optional)

What to evaluate:

Score the factory (not the project):

Harness Evolution — think about the bigger picture

Write to evolution log:

Signal Output

Edge Cases

Reference Files