From solo
Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.
npx claudepluginhub fortunto2/solo-factory --plugin soloThis skill is limited to using the following tools:
This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Post-pipeline retrospective. Parses pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.
git branch --show-current 2>/dev/nullgit log --oneline -10 2>/dev/nullgit diff --name-only HEAD~5..HEAD 2>/dev/null | head -20After a pipeline completes (or gets cancelled). This is the process quality check — /review checks code quality, /retro checks pipeline process quality.
Can also be used standalone on any project — with or without pipeline logs.
session_search(query) — find past pipeline runs and known issuescodegraph_explain(project) — understand project architecture contextcodegraph_query(query) — query code graph for project metadataIf MCP tools are not available, fall back to Glob + Grep + Read.
Detect project from $ARGUMENTS or CWD:
~/projects/my-app -> my-app)Find pipeline state file: .solo/pipelines/solo-pipeline-{project}.local.md (project-local) or ~/.solo/pipelines/solo-pipeline-{project}.local.md (global fallback)
project_root:Verify artifacts exist (parallel reads):
{project_root}/.solo/pipelines/pipeline.log{project_root}/.solo/pipelines/iter-*.log{project_root}/.solo/pipelines/progress.md{project_root}/docs/plan-done/{project_root}/docs/plan/Determine analysis mode:
Count iter logs (if they exist): ls {project_root}/.solo/pipelines/iter-*.log | wc -l
If no pipeline logs exist, the retro can still provide value by analyzing:
git log --oneline --since="1 week ago" — commit frequency, patterns, conventional formatgit log --oneline -- CLAUDE.md — how docs evolvedSkip Phases 2-4 and proceed directly to Phase 5 (Plan Fidelity) and Phase 6 (Git & Code Quality). Adjust Phase 7 scoring to weight available data more heavily.
Read pipeline.log in full. Parse line-by-line, extracting structured data from log tags:
Log format: [HH:MM:SS] TAG | message
Extract by tag:
| Tag | What to extract |
|---|---|
START | Pipeline run boundary — count restarts (multiple START lines = restarts) |
STAGE | iter N/M | stage S/T: {stage_id} — iteration count per stage |
SIGNAL | <solo:done/> or <solo:redo/> — which stages got completion signals |
INVOKE | Skill invoked — extract skill name, check for wrong names |
ITER | commit: {sha} | result: {stage complete|continuing} — per-iteration outcome |
CHECK | {stage} | {path} -> FOUND|NOT FOUND — marker file checks |
FINISH | Duration: {N}m — total duration per run |
MAXITER | Reached max iterations ({N}) — hit iteration ceiling |
QUEUE | Plan cycling events (activating, archiving) |
CIRCUIT | Circuit breaker triggered (if present) |
CWD | Working directory changes |
CTRL | Control signals (pause/stop/skip) |
Compute metrics:
total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"
per_stage = {
stage_id: {
attempts: count of STAGE lines for this stage,
successes: count of ITER lines with "stage complete" for this stage,
waste_ratio: (attempts - successes) / attempts * 100,
}
}
Read progress.md and scan for error patterns:
Unknown skill: — extract which skill name was wrong<solo:done/><solo:done/> in same iteration -> minor noise (note but don't penalize)For each error pattern found, record:
Do NOT read all iter logs — could be 60+. Use smart sampling:
First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
<solo:done/> in the outputFinal review iter: Read the last iter-*-review.log (the verdict)
Extract from each sampled log:
Error, error, Unknown, failed)<solo:done/> or <solo:redo/> present?)For each track directory in docs/plan-done/ and docs/plan/:
Read spec.md (if exists):
- [ ] and - [x] checkboxescriteria_met = checked / total * 100Read plan.md (if exists):
- [ ] and - [x] checkboxes<!-- sha:... -->)tasks_done = checked / total * 100Compile per-track summary:
Quick checks only — NOT a full /review:
Commit count and format:
git -C {project_root} log --oneline | wc -l
git -C {project_root} log --oneline | head -30
feat:, fix:, chore:, test:, docs:, refactor:, build:, ci:, perf:)conventional_pct = conventional / total * 100Committer breakdown:
git -C {project_root} shortlog -sn --no-merges | head -10
Test status (if test command exists in CLAUDE.md or package.json):
Build status (if build command exists):
Check for signs of context window problems during the pipeline run:
Iteration quality curve: Compare early iterations vs late iterations.
Observation masking usage: Check if scratch/ directory exists in project root.
Plan recitation evidence: In sampled iter logs, check if the agent re-read plan.md at task boundaries.
CLAUDE.md bloat: wc -c {project_root}/CLAUDE.md
40,000 chars: WARN — attention dilution likely
60,000 chars: RED — severe context budget pressure
Add findings to the report under ## Context Health:
## Context Health
- Iteration quality trend: {STABLE / DEGRADING / N/A}
- Observation masking: {USED / NOT USED / N/A}
- Plan recitation: {OBSERVED / ABSENT / N/A}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
Load scoring rubric from ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md.
If plugin root not available, use the embedded weights:
Scoring weights:
Note: In fallback mode (no pipeline logs), redistribute Efficiency and Stability weights to Fidelity, Quality, and Commits.
Generate report at {project_root}/docs/retro/{date}-retro.md:
# Pipeline Retro: {project} ({date})
## Overall Score: {N}/10
## Pipeline Efficiency
| Metric | Value | Rating |
|--------|-------|--------|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |
| Tracks completed | {N} | |
| Duration per track | {time/tracks} | {emoji} |
## Per-Stage Breakdown
| Stage | Attempts | Successes | Waste % | Notes |
|-------|----------|-----------|---------|-------|
| scaffold | | | | |
| setup | | | | |
| plan | | | | |
| build | | | | |
| deploy | | | | |
| review | | | | |
## Failure Patterns
### Pattern 1: {name}
- **Occurrences:** {N} iterations
- **Root cause:** {analysis}
- **Wasted:** {N} iterations
- **Fix:** {concrete suggestion with file reference}
### Pattern 2: ...
## Plan Fidelity
| Track | Criteria Met | Tasks Done | SHAs | Rating |
|-------|-------------|------------|------|--------|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |
## Code Quality (Quick)
- **Tests:** {N} pass, {N} fail (or "not configured")
- **Build:** PASS / FAIL (or "not configured")
- **Commits:** {N} total, {pct}% conventional format
## Three-Axis Growth
| Axis | Score | Evidence |
|------|-------|----------|
| **Technical** (code, tools, architecture) | {0-10} | {what changed} |
| **Cognitive** (understanding, strategy, decisions) | {0-10} | {what improved} |
| **Process** (harness, skills, pipeline, docs) | {0-10} | {what evolved} |
If only one axis is served — note what's missing.
## Recommendations
1. **[CRITICAL]** {patch suggestion with file:line reference}
2. **[HIGH]** {improvement}
3. **[MEDIUM]** {optimization}
4. **[LOW]** {nice-to-have}
## Suggested Patches
### Patch 1: {file} — {description}
**What:** {one-line description}
**Why:** {root cause reference from Failure Patterns}
\```diff
- old line
+ new line
\```
Rating guide (use these emojis):
After generating the report:
Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
For each suggested patch (if any), use AskUserQuestion:
If "Show diff first": display the full diff, then ask again (Apply / Skip)
If "Apply": use Edit tool to apply the change directly
After all patches processed:
fix(retro): {description}After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.
wc -c CLAUDE.mdgit add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"Run this phase only if ${CLAUDE_PLUGIN_ROOT} is available (i.e., solo-factory is installed). Skip if running as a standalone skill without the factory context.
After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.
Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.mdRead pipeline script signal handling and stage logic:
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.shCross-reference with failure patterns from Phase 3:
Factory Score: {N}/10
Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}
Pipeline reliability: {N}/10 — {why}
Missing capabilities:
- {what the factory couldn't do that it should have}
Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}
After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:
Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
docs/ or CLAUDE.mdArchitectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
Skill gaps: Which skills need better instructions? Which new skills should exist?
Append findings to {project_root}/docs/evolution.md (create if not exists). If ~/.solo/evolution.md exists, append there as well for cross-project tracking.
## {YYYY-MM-DD} | {project} | Factory Score: {N}/10
Pipeline: {stages run} | Iters: {total} | Waste: {pct}%
### Defects
- **{severity}** | {skill/script}: {description}
- Fix: {concrete file:change}
### Harness Gaps
- **Context:** {what knowledge was missing or stale for the agent}
- **Constraints:** {what boundary violations or inconsistencies occurred}
- **Precedents:** {patterns worth capturing for future agents — good or bad}
### Missing
- {capability the factory lacked}
### What worked well
- {skill/pattern that performed efficiently}
Rules:
Output signal: <solo:done/>
Important: /retro always outputs <solo:done/> — it never needs redo. Even if pipeline was terrible, the retro itself always completes.
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md — scoring rubric (8 axes, weights)${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md — known failure patterns and fixes