From cruise
Autonomous work loop — picks up issues, implements, QAs, PRs. Stops for merge approval.
npx claudepluginhub arthtech-ai/arthai-marketplace# Autopilot Skill — Autonomous Junior Engineer Runs a continuous work loop: assess → plan → implement → QA → PR → wait for merge → repeat. Acts like a junior engineer — takes initiative on scoped work, escalates when unsure, never merges without human approval. ## Arguments - `issue-number` (optional): Start with a specific issue. If omitted, auto-picks highest priority. - `--dry-run` (optional): Show what would be done without doing it. Useful for trust-building. ## Workflow State All state lives in `.claude/.workflow-state.json`. This file: - Persists between turns (triage router rea...
/autopilotRuns autonomous autopilot mode: iterates team-exec -> team-verify -> team-fix loops until acceptance criteria pass or blocked, producing loop tables, status, and checkpoints.
/autopilotAutonomously transforms a product idea into working, tested, documented code via 5-phase workflow: expansion, planning, execution with subagents, QA, and validation.
/autopilotStart autonomous coding loop. Modes: mission (default), improve, execute, research, evolve, build-saas, marketing.
/autopilotAutonomous execution mode — break a goal into tasks and execute them sequentially with safety checkpoints.
Runs a continuous work loop: assess → plan → implement → QA → PR → wait for merge → repeat. Acts like a junior engineer — takes initiative on scoped work, escalates when unsure, never merges without human approval.
issue-number (optional): Start with a specific issue. If omitted, auto-picks highest priority.--dry-run (optional): Show what would be done without doing it. Useful for trust-building.All state lives in .claude/.workflow-state.json. This file:
On first run, create the file. On subsequent runs (after merge, or user says "continue"), read the file and resume from the current phase.
{
"mode": "autopilot",
"started_at": "ISO timestamp",
"current_item": {
"type": "issue|pr|bug",
"number": 141,
"title": "description",
"source": "gh-issue|gh-pr|qa-finding|manual"
},
"risk": {
"score": 5,
"blast_radius": 1,
"reversibility": 1,
"confidence": 85,
"domain_sensitivity": 2,
"reasoning": "why this score"
},
"phase": "assess|classify|plan|implement|qa|self-review|pr|awaiting_merge|done",
"planned_scope": {
"files": ["list of expected files"],
"layers": ["backend", "frontend"]
},
"actual_scope": {
"files_changed": [],
"layers_touched": []
},
"attempts": {
"qa_fixes": 0,
"stuck_turns": 0
},
"session_summary": {
"completed": [{"item": "#141", "pr": 275}],
"prs_open": [275],
"blocked": [],
"issues_filed": []
}
}
If current_item is null or phase is "assess":
# Run in parallel
git status --short
git log --oneline -5
gh pr list --author @me --json number,title,state,headRefName --limit 10
gh issue list --assignee @me --json number,title,labels --limit 10
gh issue list --json number,title,labels,assignees --limit 15
Priority order (highest first):
bug, critical, urgent, P0, security → fix ASAPP1 or bugIf a specific issue-number was provided as argument, skip priority and use that.
Present the choice:
AUTOPILOT: Assessed project state.
Top priority: #141 "Short session credit refund" (bug, assigned to you, 7 days old)
Next in queue: #162 "Feedback nudge reappears" (bug), #71 "Sentry env vars" (infra)
Starting work on #141.
phase: "classify", current_item: {...}.Before touching any code, evaluate the work item:
Read the issue/PR carefully. Then score each dimension 0-3:
| Dimension | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| Blast radius | 1 isolated file | 1 feature area | Shared component (auth, API client, billing service) | Infrastructure, deployment, database schema |
| Reversibility | Simple git revert | Revert + minor cleanup | Needs migration or manual data fix to undo | Data loss possible, can't fully undo |
| Confidence | Clear requirements, existing test patterns | Some ambiguity, mostly follows codebase patterns | Novel approach needed, unclear requirements | Don't understand the existing code well enough |
| Domain sensitivity | Docs, tests, UI copy, styling | Feature logic, new endpoints, new components | Auth flows, payment logic, data models, API contracts | Secrets, production DB, CI/CD pipelines, infra config |
Sum the scores (0-12) and decide:
| Score | Level | Action |
|---|---|---|
| 0-5 | Proceed | Work autonomously. No notification needed. |
| 6-8 | Proceed with notice | Tell the user what you're doing and why. Continue unless they object. |
| 9-10 | Stop and ask | Present the risk assessment. Wait for explicit approval. |
| 11-12 | Refuse | Tell the user this requires manual intervention. |
Automatic escalation triggers (override score):
Write to workflow state:
"risk": {
"score": 5,
"blast_radius": 1,
"reversibility": 1,
"confidence": 85,
"domain_sensitivity": 2,
"reasoning": "Billing-adjacent but no Stripe integration. 3 backend files, existing test patterns."
}
If proceeding, move to Phase 3. If stopping, present assessment and wait.
For bug fixes (risk 0-5): Skip formal planning. Read the issue, identify root cause, describe the fix approach in 2-3 sentences. Write to workflow state and proceed to Phase 4.
For bug fixes (risk 6-8): Brief plan: root cause analysis, proposed fix, files to change, potential side effects. Share with user as an FYI (don't wait for approval unless risk >= 9).
For features (any risk):
Check if a plan already exists at .claude/plans/{feature-name}.md.
/planning session and wait for the plan.Scope definition (critical for Phase 6 scope guard): After planning, write the expected scope:
"planned_scope": {
"files": ["backend/app/services/session_service.py", "backend/tests/test_session.py"],
"layers": ["backend"]
}
Update phase to "implement".
Spawn the right agents based on plan layers:
| Layers in plan | Agents to spawn |
|---|---|
| backend only | auto-detect backend agent (Sonnet) — see /implement for detection logic |
| frontend only | frontend (Sonnet) |
| both | auto-detect backend + frontend (Sonnet each) |
| infra/deploy | sre (Sonnet) |
Backend agent detection: Check CLAUDE.md Tech Stack, then project files (requirements.txt → python-backend, package.json with backend framework → frontend agent as Node backend, go.mod/Cargo.toml → general-purpose). Default: python-backend.
Use agent teams for multi-agent work:
autopilot-{item-number}For simple fixes (1-3 files, risk 0-3): Skip teams entirely. Use tools directly (Read, Edit, Write, Bash). This is faster and cheaper than spawning agents for trivial changes.
Track actual scope:
As files are changed, update actual_scope in workflow state.
Update phase to "qa" when implementation is complete.
Run validation appropriate to the change:
Minimum (always):
For backend changes:
cd backend && source .venv/bin/activate && python -m pytest
cd backend && ruff check .
For frontend changes:
cd frontend && npm run lint
cd frontend && npx tsc --noEmit
cd frontend && npm test
If QA fails:
attempts.qa_fixes).phase: "blocked"If QA passes: Update phase to "self-review".
Before creating a PR, review your own work. This catches issues QA misses.
Read the full diff:
git diff --stat
git diff
Check against this list:
actual_scope.files_changed vs planned_scope.files.
Risk re-assessment: Now that code is written, has the risk changed? If risk increased (e.g., you discovered the fix needs a migration), re-evaluate. If new risk >= 9: STOP and present updated assessment.
Discovered issues: If you find unrelated bugs or tech debt during review:
gh issue create --title "..." --body "Found during #141 work"session_summary.issues_filedUpdate phase to "pr" if review passes.
Always stop here. Never merge without human approval.
Create the PR with a structured template:
gh pr create --title "fix: short description (#ISSUE)" --body "$(cat <<'EOF'
## Summary
[2-3 bullet points: what changed and why]
## Risk Assessment
- **Score: X/12 (Level Y)**
- Blast radius: [description] (N/3)
- Reversibility: [description] (N/3)
- Confidence: N% — [reasoning]
- Domain: [description] (N/3)
## Scope Check
- Planned: N files | Actual: M files
- Layers: [backend/frontend/both]
- Scope status: [clean / grew — explanation]
## QA Results
- Tests: X/Y passed
- Lint: clean/N warnings
- Types: clean/N errors
- QA fix attempts: N
## What I Didn't Do
- [Things considered but intentionally left out]
- [Related issues filed: #276, #277]
## Test Plan
- [ ] Verify [specific behavior]
- [ ] Check [edge case]
Closes #ISSUE
Co-Authored-By: Claude Autopilot <noreply@anthropic.com>
EOF
)"
Update workflow state:
{
"phase": "awaiting_merge",
"pr_number": 275
}
Present to user:
AUTOPILOT: PR #275 ready for review.
"fix: short sessions no longer consume credits (#141)"
Risk: 5/12 (Level 1 — proceeded autonomously)
Scope: 3 files planned / 3 actual (clean)
QA: all tests passing
Merge when ready. I'll pick up the next item automatically.
Wait for user input. Do not proceed until the user indicates the PR was merged (or tells you to move on to the next item).
When user confirms merge (or says "continue", "next", "merged"):
completedIf there are no more items (all issues addressed, no PRs pending):
AUTOPILOT: Session complete.
Completed: 3 items
✓ #141 — short session credit refund (PR #275, merged)
✓ #162 — feedback nudge fix (PR #276, merged)
✓ #71 — sentry env vars (PR #277, merged)
Filed during work: #278 (stale test cleanup), #279 (billing edge case)
Nothing left in queue. /autopilot again when new issues come in.
Update workflow state: mode: "idle".
If --dry-run is passed:
This lets users build trust before enabling full autonomy.
The user can exit autopilot at any time by:
On exit:
mode: "paused", preserve all stateTo resume: /autopilot (reads existing state and continues).
The agent communicates at natural checkpoints, not continuously:
| Event | What to say |
|---|---|
| Starting work on an item | "Starting #141 (risk 5/12). Implementing backend fix for short sessions." |
| QA passed | Nothing (silence = good) |
| QA failed, fixing | "QA: 2 test failures, fixing..." |
| QA failed, stuck | "BLOCKED: Can't resolve test failure after 2 attempts. Need help." |
| Scope grew | "SCOPE: Started at 3 files, now at 6. The fix requires [X]. Continue?" |
| Risk escalation | "RISK: This needs a migration (was risk 5, now risk 9). Approve?" |
| PR ready | Full PR summary (see Phase 7) |
| Moving to next item | "Merged #275. Next: #162 (feedback nudge bug, risk 3/12)." |
| Session done | Full session summary |
Do NOT:
Minimize API spend by choosing the cheapest capable model:
| Task | Agent/Model | Cost |
|---|---|---|
| Project assessment | explore-light (Haiku) or inline | 1x |
| Simple bug fix (1-3 files) | Inline tools (no agent) | 0x (just Opus) |
| Backend implementation | auto-detect backend (Sonnet) | 10x |
| Frontend implementation | frontend (Sonnet) | 10x |
| QA validation | ops (Haiku) for tests, inline for review | 1x |
| Complex planning | architect (Opus) | 60x — only for risk 6+ |
Rule: Never spawn an Opus agent for a risk 0-3 task. Handle it inline or with Sonnet/Haiku.