From octo
Executes tasks in structured loops until goals met, with max iterations, success evaluations, progress tracking, and safety checks for refinement or convergence.
npx claudepluginhub nyldn/claude-octopus --plugin octoThis skill uses the workspace's default tool permissions.
Systematic iterative execution with clear goals, exit conditions, and progress tracking.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Systematic iterative execution with clear goals, exit conditions, and progress tracking.
Core principle: Define goal → Set max iterations → Execute → Evaluate → Loop or complete.
Use this skill when user wants to:
Do NOT use for:
**Loop Intent:**
Goal: [what should be achieved]
Success criteria: [how do we know we're done]
Max iterations: [safety limit]
Per-iteration tasks: [what to do each loop]
Use AskUserQuestion if unclear:
**Safety Validation:**
- [ ] Max iterations defined (no infinite loops)
- [ ] Success condition is measurable
- [ ] Each iteration makes progress
- [ ] Failure exit strategy exists
- [ ] User aware of potential duration
Never proceed without max iterations defined.
**Starting Iterative Loop**
Goal: [description]
Max iterations: [N]
Success criteria: [condition]
---
### Iteration 1 / [N]
For each iteration:
**Iteration [current] / [max]**
**Actions:**
1. [Action 1]
→ [result/output]
2. [Action 2]
→ [result/output]
3. [Action 3]
→ [result/output]
**Evaluation:**
- Success criteria met? [Yes/No]
- Progress made? [Yes/No]
- Issues found: [list any issues]
**Status:** [Continue/Success/Need intervention]
---
Use TodoWrite to track iterations:
Iteration Progress:
✓ Iteration 1 - [what was done]
✓ Iteration 2 - [what was done]
⚙️ Iteration 3 - [in progress]
- Iteration 4 - [pending]
- Iteration 5 - [pending]
🎉 **Success! Loop complete.**
**Goal achieved:** [description]
**Iterations used:** [N] / [max]
**Final state:**
[description of what was achieved]
**Summary of iterations:**
1. Iteration 1: [what happened]
2. Iteration 2: [what happened]
...
N. Iteration N: [what happened] ✓ Success
⚠️ **Max iterations reached without full success**
**Iterations completed:** [max]
**Goal:** [description]
**Current state:** [how close we got]
**Progress made:**
- [Improvement 1]
- [Improvement 2]
- [Improvement 3]
**Remaining issues:**
- [Issue 1]
- [Issue 2]
**Options:**
1. Accept current state (substantial progress made)
2. Continue with [N] more iterations
3. Change approach (current method may not work)
What would you like to do?
🛑 **Stopping early: No progress detected**
**Iteration:** [N] / [max]
**Reason:** Last [M] iterations showed no improvement
**Analysis:**
This suggests the current approach may be fundamentally flawed.
**Recommendation:**
Rather than continue looping, let's:
1. Analyze why no progress is being made
2. Consider alternative approaches
3. Re-evaluate the goal or success criteria
Shall we pause and reassess?
User: "Loop around 5 times auditing, enhancing, testing, until it's done"
Implementation:
**Loop Goal:** Code passes all quality gates
**Max Iterations:** 5
**Per-iteration:**
1. Audit code for issues
2. Enhance/fix identified issues
3. Run tests
4. Check if all pass
**Success:** All tests pass + no issues found
Execute:
Iteration 1:
- Audit → Found 8 issues
- Fix → Fixed 8 issues
- Test → 2 tests still failing
- Continue
Iteration 2:
- Audit → Found 2 new issues from fixes
- Fix → Fixed 2 issues
- Test → All tests pass ✓
- Success! Stopping early (2/5 iterations used)
User: "Keep trying optimizations until we hit < 100ms response time"
Implementation:
**Loop Goal:** Response time < 100ms
**Max Iterations:** 10
**Per-iteration:**
1. Measure current performance
2. Identify bottleneck
3. Apply optimization
4. Re-measure
**Success:** Response time < 100ms
Execute:
Iteration 1: 450ms → Cache database queries → 280ms (Continue)
Iteration 2: 280ms → Add index to frequent query → 150ms (Continue)
Iteration 3: 150ms → Implement response compression → 85ms (Success!)
User: "Try deploying, retry up to 3 times if it fails"
Implementation:
**Loop Goal:** Successful deployment
**Max Iterations:** 3
**Per-iteration:**
1. Attempt deployment
2. Check status
3. If failed, wait before retry
**Success:** Deployment succeeds
Execute:
Iteration 1: Deploy → Failed (API timeout) → Wait 10s
Iteration 2: Deploy → Failed (API timeout) → Wait 20s
Iteration 3: Deploy → Success ✓
User: "Iterate 4 times improving the error messages based on user feedback"
Implementation:
**Loop Goal:** Error messages meet clarity standard
**Max Iterations:** 4
**Per-iteration:**
1. Review current error messages
2. Identify confusing ones
3. Rewrite for clarity
4. Evaluate against criteria
**Success:** All messages rated 8+/10 for clarity
Execute each iteration with progressive improvement
Loop for debugging:
"Keep debugging until all tests pass, max 5 tries"
Each iteration:
- Use skill-debug to investigate failure
- Apply fix
- Re-run tests
- Evaluate
Loop for comprehensive checking:
"Loop 3 times auditing different aspects"
Iteration 1: Audit security
Iteration 2: Audit performance
Iteration 3: Audit accessibility
Loop for TDD cycles:
"Do 5 red-green-refactor cycles"
Each iteration:
- Write failing test (red)
- Make it pass (green)
- Refactor (refactor)
- Evaluate and continue
Good:
Loop max 5 times trying to fix the issue
Dangerous:
Keep trying until it works
(What if it never works? Infinite loop!)
Good:
Success: All 15 tests pass AND code coverage > 80%
Poor:
Success: Code looks better
(Too subjective)
**Progress Tracking:**
Iteration 1: 5/15 tests passing
Iteration 2: 10/15 tests passing
Iteration 3: 13/15 tests passing
Iteration 4: 15/15 tests passing ✓
Don't continue looping if goal is achieved:
**Iteration 2/5:** All tests pass!
Stopping early - goal achieved.
No need to continue to iteration 3.
Iteration 4: 10/15 tests passing
Iteration 5: 10/15 tests passing
Iteration 6: 10/15 tests passing
⚠️ No progress in 3 iterations - stopping to reassess approach
| Action | Why It's Dangerous |
|---|---|
| No max iterations | Could loop forever |
| Vague success criteria | Don't know when to stop |
| No progress tracking | Can't tell if making progress |
| Ignoring stalls | Waste time on ineffective approach |
| Same action each loop | If not working, need different approach |
If the strategy-rotation hook fires, immediately change approach. Do not retry the same approach. Explain what you'll do differently before the next attempt. The hook fires after consecutive failures of the same tool — this is a strong signal that the current approach is fundamentally wrong, not just slightly off.
Every iterative loop MUST track a Self-Regulation Score that accumulates danger signals. This prevents runaway loops where the agent keeps "fixing" things without real progress.
Maintain a mental window of the last 10 iterations (or fewer if less than 10 have run). After each iteration, check for repeated patterns:
Single-state repetition: Did the same outcome/error occur 3+ times consecutively?
Multi-step cycle detection: Is there an A→B→A→B oscillation?
On first detection: Announce the pattern to the user. Attempt ONE diagnostic retry with explicit acknowledgment: "This pattern has repeated — here's what I'll do differently: [specific change]."
On second detection: HALT immediately. Display the detected cycle and ask the user whether to continue with a completely different approach or stop.
Track a cumulative score starting at 0%. Each event adds to the score.
Default weights (override via ~/.claude-octopus/loop-config.conf):
| Event | Score Impact |
|---|---|
| Revert (git revert, undo, roll back) | +15% |
| Touching files unrelated to the stated goal | +20% |
| A fix that requires changing >3 files | +5% |
| After the 15th fix attempt | +1% per additional fix |
| All remaining issues are Low severity | +10% |
If WTF score exceeds 20%: STOP immediately. Show:
Hard cap: 50 iterations regardless of score or progress. No exceptions.
At loop start, check for ~/.claude-octopus/loop-config.conf. If it exists, read the key=value pairs and use them instead of defaults. Format:
# Loop Self-Regulation Configuration
WINDOW_SIZE=10
REVERT_PENALTY=15
UNRELATED_FILES_PENALTY=20
LARGE_FIX_PENALTY=5
AFTER_FIX_15_PENALTY=1
ALL_LOW_SEVERITY_PENALTY=10
WTF_THRESHOLD=20
HARD_CAP=50
STUCK_THRESHOLD=3
If the file does not exist, use the defaults shown above. Users can create this file to tune sensitivity for their workflow.
You do NOT need external tools for this. Track mentally during the loop:
Iteration 5/20 | Self-regulation: 10% (1 revert, 0 unrelated files)
The strategy-rotation hook and self-regulation are complementary:
MAX_ITERATIONS = user_specified or 10 # Always have a limit
HARD_CAP = 50 # Absolute maximum regardless of user setting
Track WTF score across iterations.
If score > 20%: STOP and ask user.
Track last 10 iterations.
If repeated pattern detected twice: STOP and ask user.
If last 3 iterations show same result:
→ Stop and ask user
If total time > 30 minutes:
→ Checkpoint progress
→ Ask user if should continue
Every N iterations:
→ Show progress
→ Ask if should continue or adjust approach
| Pattern | Max Iterations | Success Criteria | Early Exit |
|---|---|---|---|
| Test until pass | 5-10 | All tests pass | Yes |
| Performance optimization | 10-20 | Metric < target | Yes |
| Retry with backoff | 3-5 | Operation succeeds | Yes |
| Incremental refinement | 3-7 | Quality threshold met | Maybe |
| Comprehensive audit | 3-5 | All areas covered | No |
When the user specifies a Metric command, switch to mechanical metric verification mode. This replaces subjective evaluation with automated measurement, git-backed experiments, and automatic rollback on regression.
Falls back to standard loop behavior (above) when no metric is specified.
git revert HEAD --no-edit if metric worsensexperiment: prefix before verification| Parameter | Format | Required | Description |
|---|---|---|---|
| Metric | Metric: <shell command> | Yes (for this mode) | Command whose stdout is a number (the metric value) |
| Direction | Direction: higher|lower | Yes | Whether higher or lower metric values are better |
| Guard | Guard: <shell command> | No | Must exit 0 for a change to be kept; run after metric |
| Iterations | Iterations: N | No | Max iterations (default: unbounded, runs until interrupted) |
All results are logged as JSONL to .claude-octopus/experiments/<YYYY-MM-DD>.jsonl.
Each line is a JSON object:
{"iteration": 1, "timestamp": "2026-03-21T14:30:00Z", "metric": 72.5, "best": 72.5, "status": "kept", "description": "Add index to users table", "commit": "abc1234"}
Fields:
iteration — iteration number (starting from 1; iteration 0 is baseline)timestamp — ISO 8601 timestampmetric — measured value from the metric commandbest — best metric value seen so farstatus — "kept" (improvement), "reverted" (regression), or "error" (metric/guard crashed)description — one-line summary of what was changedcommit — short git SHA of the experiment commit (before potential revert)You MUST follow this exact sequence for each iteration. No steps may be skipped or reordered.
mkdir -p .claude-octopus/experiments
.claude-octopus/experiments/<today>.jsonl exists, read it to determine the current best metric value and iteration count. Resume from the next iteration number.{"iteration": 0, "timestamp": "...", "metric": <baseline>, "best": <baseline>, "status": "baseline", "description": "Baseline measurement", "commit": "<current HEAD short SHA>"}
Step 1: Review state. Read the experiment log (.claude-octopus/experiments/<today>.jsonl), review git history (git log --oneline -10), and identify what has been tried, what worked, and what failed.
Step 2: Pick the next change. Based on what worked/failed/is untried, decide on ONE focused change. Do NOT combine multiple unrelated changes.
Step 3: Make the change. Implement exactly one atomic change.
Step 4: Git commit BEFORE verification. Commit with the experiment: prefix:
git add -A && git commit -m "experiment: <one-line description of the change>"
This ensures every experiment is recorded in git history regardless of outcome.
Step 5: Run mechanical verification. Execute the metric command and capture the numeric result.
Step 6: Evaluate and act.
If metric improved (higher when Direction=higher, lower when Direction=lower):
git revert HEAD --no-edit. Log status as "reverted".If metric stayed the same:
git revert HEAD --no-edit. Log status as "reverted".If metric worsened:
git revert HEAD --no-edit. Log status as "reverted".If metric command crashed (non-zero exit, no numeric output):
git revert HEAD --no-edit. Log status as "error".Step 7: Log the result. Append a JSONL entry to .claude-octopus/experiments/<today>.jsonl.
Step 8: Report iteration summary. Display:
Iteration N: <description>
Metric: <value> (best: <best>) — <kept|reverted|error>
Step 9: Repeat — go to Step 1 of the next iteration, unless:
If an experiment log already exists for today:
When the loop completes (iterations exhausted or user stops), report:
Experiment Complete
Iterations: N
Baseline: <initial metric>
Final best: <best metric>
Improvement: <delta> (<percentage>%)
Kept: K changes, Reverted: R changes, Errors: E
/octo:loop Metric: npm test -- --coverage | grep 'All files' | awk '{print $10}' Direction: higher Guard: npm test Iterations: 20
This will:
experiment: ..., measure coveragenpm test passes → keepgit revert HEAD --no-editIterative loop → Clear goal + Max iterations + Progress tracking + Exit strategy
Otherwise → Infinite loops + Wasted effort + Unclear when done
Define the goal. Set the limit. Track progress. Know when to stop.