Start autonomous execution session with stop hook integration. Works until all tasks complete or max iterations reached. Uses Ralph Wiggum pattern with SpecWeave workflow integration. Activates for: auto, autonomous, auto mode, ship while sleeping.
Executes autonomous development sessions with stop hooks, quality gates, and intelligent increment management.
/plugin marketplace add anton-abyzov/specweave/plugin install sw@specweave[INCREMENT_IDS...] [OPTIONS]Start autonomous execution session using Claude Code's Stop Hook.
When user says "auto" or "autonomous" or "keep working" or provides a task description, you should:
specweave auto [INCREMENT_IDS] [OPTIONS]
Now work on the increment tasks. When you try to exit, the stop hook will check completion conditions and feed the next task back to you. Continue until all tasks are complete and quality gates pass.
/sw:auto [INCREMENT_IDS...] [OPTIONS]
:::tip π Claude Code's Game-Changing Features for Auto Mode
Compact Command (VSCode) β Use compact mode to keep Claude Code inside your VSCode window. Work continuously for hours in the same session without context switching between terminal and editor. Perfect for long auto mode sessions!
STOP Hooks with Subagents β Stop hooks now work with spawned subagents! This means /sw:auto can validate quality gates at EVERY level of execution. When auto mode spawns specialized agents (QA, Security, Performance), the stop hook validates their results before allowing the session to continue.
Real-world proof: Boris Cherny (Claude Code creator) shipped 259 PRs, 497 commits, 40,000 lines in one month without opening an IDE β using autonomous execution with stop hooks. See demo :::
INCREMENT_IDS: One or more increment IDs to process (e.g., 0001, 0001-feature)
| Option | Description | Default |
|---|---|---|
--max-iterations N | Maximum iterations (safety net, not primary stop) | 2500 (v2.3) |
--max-hours N | Maximum hours to run | 600 hours (25 days, v2.3) |
--simple | Pure Ralph mode (minimal context) | false |
--dry-run | Preview without starting | false |
--all-backlog | Process all backlog items | false |
--skip-gates G1,G2 | Pre-approve specific gates | None |
--no-increment, --no-inc | Skip auto-creation (require existing increments) | false |
--prompt "text" | Analyze prompt and create increments (intelligent chunking) | None |
--yes, -y | Auto-approve increment plan (skip user approval) | false |
--tdd, --strict | NEW v2.2: Enable TDD strict mode - ALL tests must pass | false |
--build | NEW v0.4.0: Build must pass before completion (auto-heal: 3 retries) | false |
--tests | NEW v0.4.0: Tests must pass before completion (unit + integration) | false |
--e2e | NEW v0.4.0: E2E tests must pass before completion | false |
--lint | NEW v0.4.0: Linting must pass before completion (auto-heal: 3 retries) | false |
--types | NEW v0.4.0: Type-checking must pass before completion (auto-heal: 3 retries) | false |
--cov <n> | NEW v0.4.0: Code coverage must meet threshold (%) | 80 |
--e2e-cov <n> | NEW v0.4.0: E2E coverage must meet threshold (%) | 70 |
--cmd "<command>" | NEW v0.4.0: Custom command must pass before completion | None |
:::warning v2.3 - Iteration limits are SAFETY NETS The primary completion criteria is tests passing + tasks complete. Iteration limits (2500 iterations, 600 hours) are backup safety nets. Per the Ralph Wiggum pattern, completion should be detected through external verification (test results), not self-assessment.
IMPORTANT: Stop hook runs PER AGENT - Each spawned subagent gets its own hook invocation. Iteration count is shared via session file, reflecting main agent loops. :::
Auto mode will NOT stop until ALL specified conditions pass.
Completion conditions are quality gates that prevent auto mode from completing until specific checks pass:
--build: Build must succeed (auto-heal enabled, max 3 retries)--tests: All tests must pass (unit + integration tests)--e2e: E2E tests must pass (Playwright, Cypress, etc.)--lint: Linting must pass (ESLint, Black, Clippy, etc.)--types: Type-checking must pass (TypeScript, mypy, etc.)--cov N: Code coverage must meet threshold (e.g., --cov 80 = 80% minimum)--e2e-cov N: E2E coverage must meet threshold--cmd "...": Custom command must pass (e.g., --cmd "make verify")| Condition | Auto-Heal? | Behavior |
|---|---|---|
--build | β Yes (3 retries) | Build failures auto-fixed by LLM |
--lint | β Yes (3 retries) | Lint errors auto-fixed by LLM |
--types | β Yes (3 retries) | Type errors auto-fixed by LLM |
--tests | β No | Tests must be fixed manually by LLM |
--e2e | β No | E2E tests must be fixed manually |
--cov | β No | Must write more tests to meet threshold |
--cmd | β No | Custom commands run as-is |
Auto-heal means the hook will:
Manual fix means:
Commands are auto-detected based on your project structure:
TypeScript/Node:
# Detected from package.json, jest.config.js, vitest.config.ts
build: npm run build
tests: npm test OR npx vitest run
e2e: npx playwright test OR npx cypress run
lint: npm run lint OR npx eslint .
types: npx tsc --noEmit
Python:
# Detected from requirements.txt, pyproject.toml, pytest.ini
build: python -m build
tests: pytest
e2e: (none)
lint: black --check . OR flake8
types: mypy .
Go:
# Detected from go.mod
build: go build ./...
tests: go test ./...
lint: golangci-lint run
Rust:
# Detected from Cargo.toml
build: cargo build
tests: cargo test
lint: cargo clippy
Basic - Build + Tests:
/sw:auto --build --tests
# β Auto mode will NOT stop until build passes AND all tests pass
Strict Quality:
/sw:auto --build --tests --e2e --lint --types --cov 80
# β ALL conditions must pass:
# β
Build succeeds
# β
Tests pass
# β
E2E tests pass
# β
Lint passes
# β
Type-check passes
# β
Coverage β₯80%
Custom Command:
/sw:auto --cmd "make verify"
# β Auto mode will run `make verify` before completion
Combined with Other Flags:
/sw:auto --prompt "Build auth system" --yes --build --tests --cov 85
# β Intelligent chunking + auto-approve + quality gates
When you start auto mode with completion conditions, you'll see:
π Auto Session Started
Session ID: auto-2026-01-04-abc123
Max Iterations: 2500
Max Hours: 600
Simple Mode: false
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βοΈ COMPLETION CONDITIONS
Auto mode will NOT stop until ALL conditions pass:
β’ π¨ Build must pass (auto-heal enabled, max 3 retries)
β’ β
Tests must pass (unit + integration)
β’ π E2E tests must pass
β’ π Code coverage must be β₯80%
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Increment Queue (1):
β’ 0001-auth-system
Current: 0001-auth-system
The session will continue until:
β’ All tasks complete AND tests pass
β’ ALL 4 completion conditions pass
β’ Max iterations (2500) reached
β’ Max hours (600) exceeded
β’ You run specweave cancel-auto
β’ A human gate requires approval
The stop hook (stop-auto.sh) validates completion conditions:
Before allowing completion, the hook runs:
plugins/specweave/hooks/validate-completion-conditions.sh
For each condition:
Only when ALL conditions pass:
You can override completion conditions per increment in metadata.json:
{
"increment": "0001-auth-system",
"autoCompletion": {
"conditions": [
{ "type": "build" },
{ "type": "tests" },
{ "type": "coverage", "threshold": 90 }
],
"override": true
}
}
When override: true, the increment-specific conditions replace the session-level conditions.
Issue: "Build command not detected"
scripts.build to package.json OR use --cmd "your-build-cmd"Issue: "Tests pass but coverage below threshold"
Issue: "Auto-heal keeps retrying but failing"
Issue: "E2E tests not detected"
playwright.config.ts or cypress.config.js exists--build --tests for basic quality gates--cov 70, increase to 80-90 over time--e2e for user-facing features--cmd for project-specific checks (e.g., security scans)Auto mode now creates increments automatically when none exist!
/sw:auto invoked
β
βΌ
Are INCREMENT_IDS specified? ββYESββ> Use specified increments
β
NO
βΌ
Active increment exists? ββYESββ> Use active increment
β
NO
βΌ
--no-increment/--no-inc flag? ββYESββ> ERROR: No increments found
β
NO (DEFAULT)
βΌ
π§ INTELLIGENT INCREMENT CREATION
β
ββ> Analyze user context/prompt
ββ> Check for matching planned/backlog increments
ββ> Match existing OR create new increment(s)
β
βΌ
Auto mode starts with new/matched increment(s)
The LLM will analyze the context and decide:
0002-user-authentication0003-payment-integration# User says: "Let's ship the dashboard feature"
/sw:auto
# β LLM finds 0004-dashboard in backlog, activates it
# User says: "Build a user profile page with avatar upload"
/sw:auto
# β LLM creates 0005-user-profile-page with spec + tasks
# User says: "I want to work on auth and notifications"
/sw:auto
# β LLM creates queue: [0001-authentication, 0002-notifications]
# User says: "Just work on what's already planned"
/sw:auto --no-increment # or --no-inc
# β ERROR if no active increment (strict mode)
Use --prompt to provide a feature description for intelligent chunking:
# Analyze prompt and show increment plan for approval
/sw:auto --prompt "Build e-commerce with auth, products, cart, checkout"
# Auto-approve plan and start execution
/sw:auto --prompt "Build e-commerce with auth, products, cart, checkout" --yes
--yes flag used)/sw:incrementπ Increment Plan
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Total Features: 4
Total Tasks: ~34
Estimated Duration: 1-2 days
Increments: 3
Increments:
--------------------------------------------------
1. User Authentication
ID: 0001-user-authentication
Tasks: ~12
Features: auth
Depends on: (none)
2. Product Catalog
ID: 0002-product-catalog
Tasks: ~10
Features: products
3. Shopping Cart & Checkout
ID: 0003-shopping-cart-checkout
Tasks: ~12
Features: cart, checkout
Depends on: 0001-user-authentication, 0002-product-catalog
π‘ Review the plan above.
Options:
1. Approve - Start execution with this plan
2. Modify - Adjust increment structure
3. Cancel - Abort and return to prompt
To skip this prompt in future: use --yes flag
/sw:auto --prompt "..."
β
βΌ
Analyze & Show Plan
β
ββ --yes flag? ββYESββ> Auto-approve
β β
β βΌ
β Create Increments β Start Session
β
ββ No --yes flag
β
βΌ
Wait for User
β
ββ Approve β Create Increments β Start Session
ββ Modify β LLM adjusts plan β Re-show
ββ Cancel β Exit
1. User runs /sw:auto (with or without IDs)
β
βΌ
2. specweave auto command creates session state
ββ .specweave/state/auto-session.json
β
βΌ
3. Claude starts working on tasks
ββ /sw:do executes tasks
β
βΌ
4. Claude tries to exit (naturally)
β
βΌ
5. Stop Hook intercepts (stop-auto.sh)
ββ Checks: All tasks complete?
ββ Checks: Max iterations reached?
ββ Checks: Completion promise?
ββ Checks: Human gate pending?
β
ββββββββ΄βββββββ
βΌ βΌ
INCOMPLETE COMPLETE
β β
βΌ βΌ
Block exit Approve exit
Re-feed Session ends
prompt
# Start auto on current increment
/sw:auto
# Start on specific increment
/sw:auto 0001-user-auth
# Multiple increments
/sw:auto 0001 0002 0003
# Limit iterations
/sw:auto --max-iterations 50
# Time limit
/sw:auto --max-hours 8
# Simple/Ralph mode
/sw:auto --simple
# Preview only
/sw:auto --dry-run
# All backlog items
/sw:auto --all-backlog
# Skip deploy gate (pre-approved)
/sw:auto --skip-gates deploy
# Multiple gates
/sw:auto --skip-gates "deploy,migrate"
/sw:auto-status
/sw:cancel-auto
Just run /sw:do - it will detect incomplete tasks and continue.
Or use Claude Code's built-in:
/resume # Pick session to resume
claude --continue # Continue last session
In .specweave/config.json:
{
"auto": {
"enabled": true,
"maxIterations": 500,
"maxHours": 120,
"testCommand": "npm test",
"coverageThreshold": 80,
"enforceTestFirst": false,
"humanGated": {
"patterns": ["deploy", "migrate", "publish"],
"timeout": 1800
}
}
}
Note: The stop hook will NOT allow completion until tests are actually executed. If test files exist (.test.ts, .spec.ts, playwright.config.ts, etc.), auto mode will block exit and require test runs.
The session ends when ANY of these occur:
[x] AND tests were executed<auto-complete>DONE</auto-complete>/sw:cancel-autoβ οΈ IMPORTANT: Auto mode will NOT complete just because tasks are marked done. If test files exist in the project, the stop hook ENFORCES test execution. You'll see messages like:
Pure Ralph Wiggum behavior:
/sw:auto --simple
Auto mode plays a satisfying sound when work completes successfully!
| Event | Sound | Platforms | Meaning |
|---|---|---|---|
| Session Complete (Success) β | Glass.aiff (macOS)<br>complete.oga (Linux)<br>Windows Notify (Windows) | All | All tasks done, tests passing - work finished! |
Sound plays ONLY on complete success - when all tasks are done AND all tests pass. This way you know when to check back without being interrupted during ongoing work.
The sound notification works automatically on:
Sounds fail gracefully on systems without audio support.
CRITICAL: The stop hook runs PER AGENT, not globally!
Main Agent (Claude Code)
β
βββ Stop hook invoked when main agent tries to exit
β
βββ Spawns Subagent A (Task tool)
β βββ Subagent A completes β returns to main agent
β (NO stop hook for subagent exit by default)
β
βββ Spawns Subagent B (Task tool with stop_hooks enabled)
β βββ Stop hook CAN be invoked if configured
β
βββ Main agent tries to exit β Stop hook invoked
Iteration count = main agent loops: When you see "Iteration 42/2500", that's 42 times the MAIN agent tried to exit, not subagent work.
Subagent work is "free": Spawning specialized agents (QA, Security, etc.) doesn't consume iterations from the main loop.
Shared session state: All agents (main + sub) share the same auto-session.json, so task completion is tracked globally.
Test validation at main level: The stop hook validates test results when the MAIN agent tries to complete, ensuring all subagent work is verified.
To enable stop hooks for subagents (advanced):
// In Task tool call
{
"stop_hooks": true, // Enable stop hook for this subagent
"inherit_session": true // Share session state with parent
}
--max-iterations as a safety net, not a targetAuto mode v2.1 includes critical improvements for reliable long-running sessions:
Auto mode now monitors context size and triggers compaction when needed:
context_near_limit events to auto-iterations.logConfiguration:
{
"auto": {
"contextThreshold": 150000 // tokens before compaction warning
}
}
Detects and logs stale sessions (zombie detection):
stale_heartbeat_detected event.specweave/state/heartbeat.jsonHeartbeat format:
{
"timestamp": "2026-01-02T08:00:00Z",
"sessionId": "auto-2026-01-02-abc123",
"pid": 12345,
"iteration": 42
}
Full support for Apple platform testing:
| Framework | Detection Pattern |
|---|---|
| xcodebuild test | Executed X tests, with Y failures |
| Swift PM (swift test) | Test Suite passed/failed |
| Xcode build | BUILD FAILED, xcodebuild: error: |
Features:
Works with ANY test framework via exit codes and patterns:
| Pattern Type | Examples |
|---|---|
| Exit code | Non-zero = failure |
| Universal failure | FAIL, ERROR, FAILED, failed |
| Universal success | All tests passed, SUCCESS, OK |
Fallback chain:
Failures are classified into categories with different handling:
| Category | Patterns | Handling |
|---|---|---|
| Transient | Network errors, timeouts, flaky tests | Immediate retry |
| Fixable | Assertion errors, type errors | AI analysis + fix |
| Structural | Import errors, syntax errors | Deeper analysis |
| External | Missing files, env config | Pause + alert |
| Unfixable | Permission denied, external service | Log + skip |
Example classifications:
ECONNREFUSED β transientexpect(received).toEqual(expected) β fixableModule not found β structuralENOENT: no such file β externalProgress preserved at task boundaries for crash recovery:
.specweave/state/task-checkpoint.jsonCheckpoint format:
{
"taskId": "T-003",
"incrementId": "0001-feature",
"timestamp": "2026-01-02T08:00:00Z",
"status": "in_progress",
"contextTokens": 145000
}
Graceful handling of hung commands:
Configuration:
{
"auto": {
"timeouts": {
"test": 600, // 10 minutes
"build": 300, // 5 minutes
"deploy": 600 // 10 minutes
}
}
}
All reliability events logged to .specweave/logs/auto-iterations.log:
{"timestamp":"2026-01-02T08:00:00Z","event":"iteration","iteration":42,...}
{"timestamp":"2026-01-02T08:01:00Z","event":"context_near_limit","tokens":152000}
{"timestamp":"2026-01-02T08:02:00Z","event":"stale_heartbeat_detected","age":"320s"}
{"timestamp":"2026-01-02T08:03:00Z","event":"failure_classified","category":"transient"}
Enable TDD strict mode to enforce ALL tests passing before completion:
/sw:auto --tdd 0001-feature
# or
/sw:auto --strict 0001-feature
TDD Mode Requirements:
TDD mode can be configured at multiple levels with priority:
--tdd flag)Example: Enable TDD for a specific increment:
// .specweave/increments/0001-feature/metadata.json
{
"tddMode": true,
"testMode": "tdd"
}
Or via spec.md frontmatter:
---
increment: 0001-feature
title: "Critical Payment Feature"
tdd: true
---
Console output shows TDD source:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π AUTO MODE CONTINUING
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π STOP CRITERIA: π΄ TDD MODE: ALL tests MUST pass
TDD Source: increment metadata.json
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Global Configuration (.specweave/config.json):
{
"testing": {
"defaultTestMode": "tdd", // "tdd", "test-first", or "test-after"
"coverageTargets": {
"unit": 85,
"integration": 80,
"e2e": 90
}
}
}
Auto mode now discovers and displays available test commands for your project:
The stop hook scans for test frameworks and shows you exactly what commands to run:
AVAILABLE TEST COMMANDS FOR THIS PROJECT:
Unit/Integration Tests:
β’ npm test (npm)
β’ npx vitest run (vitest)
E2E Tests:
β’ npx playwright test (playwright)
PRIORITY: Run ALL tests BEFORE marking tasks complete!
Supported frameworks detected automatically:
| Framework | Detection Method |
|---|---|
| npm scripts | package.json scripts.test |
| Vitest | vitest.config.ts/js or dependency |
| Jest | jest.config.ts/js or dependency |
| Playwright | playwright.config.ts/js |
| Cypress | cypress.config.ts/js or /cypress dir |
| Detox | .detoxrc.js/json or dependency |
| Pytest | pytest.ini or pyproject.toml |
| Go test | go.mod |
| Cargo test | Cargo.toml |
| Xcode | *.xcodeproj or *.xcworkspace |
| Swift test | Package.swift |
| Gradle | build.gradle(.kts) |
| Maestro | maestro.yaml or .maestro/ |
v2.2 now logs EXACTLY why auto mode stops:
All stop reasons logged to .specweave/logs/auto-stop-reasons.log:
{
"timestamp": "2026-01-02T08:00:00Z",
"sessionId": "auto-2026-01-02-abc123",
"reason": "All tasks completed, all tests passed (42 passed, 0 failed)",
"success": true,
"iteration": 15,
"increment": "0001-feature",
"testsRun": true,
"testsPassed": 42,
"testsFailed": 0
}
Stop reasons categorized:
| Category | Success | Example |
|---|---|---|
all_tasks_complete | β | All tests pass, all tasks done |
completion_promise | β | <auto-complete>DONE</auto-complete> detected |
max_iterations_reached | β | Safety limit hit (not ideal) |
max_hours_exceeded | β | Time limit hit |
test_failures_exhausted | β | 3 retry attempts failed |
external_failure | β | Environment/config issue |
human_gate_pending | βΈοΈ | Waiting for user approval |
For iOS/Android projects, auto mode detects:
| Framework | Detection | Command |
|---|---|---|
| Xcode (iOS) | xcodebuild test output | xcodebuild -scheme X test |
| Swift PM | swift test output | swift test |
| Detox (RN) | detox test output | detox test -c ios.sim.debug |
| Maestro | maestro test output | maestro test flow.yaml |
| Appium | Test framework output | Framework-specific |
Best Practice for Mobile Apps:
--tdd for strictest enforcementExample mobile test detection:
Executed 15 tests, with 0 failures (0 unexpected) in 12.345 seconds
** TEST SUCCEEDED **
Auto mode now includes comprehensive UI/UX quality gates that run automatically when E2E tests are detected.
When @axe-core/playwright or similar accessibility testing tools are detected, auto mode:
Violation Severity Handling:
| Severity | Action | Example |
|---|---|---|
| Critical | BLOCKS completion | Missing alt text, form without labels |
| Serious | BLOCKS completion | Color contrast, missing document lang |
| Moderate | Warning only | Landmark regions |
| Minor | Warning only | Empty headings |
Enable in your tests:
import { injectAxe, checkA11y } from '@axe-core/playwright';
test('page is accessible', async ({ page }) => {
await page.goto('/');
await injectAxe(page);
await checkA11y(page);
});
Auto mode parses E2E test output for console errors:
console.error from application codeAutomatic exclusions:
Add custom exclusions in config:
{
"auto": {
"consoleErrors": {
"excludePatterns": ["Expected test error"]
}
}
}
Auto mode detects and reports on UI state test coverage:
| State | Detection | Recommendation |
|---|---|---|
| Loading | Spinners, skeletons, aria-busy | Test loading/skeleton states |
| Error | Error boundaries, 404/500 pages | Test error handling |
| Empty | No data, no results | Test empty state displays |
Shows β οΈ warning if states are detected but not explicitly tested.
Auto mode now handles multi-increment queues with smooth transitions.
When an increment completes, auto mode shows:
β
INCREMENT COMPLETE: 0001-user-auth
βββββββββββββββββββββββββββββββββββββββββββββββββββ
SUMMARY:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
π Tasks: 15/15 | Duration: 45m
π§ͺ Tests: 42 passed, 0 failed
β
Status: All acceptance criteria met
βββββββββββββββββββββββββββββββββββββββββββββββββββ
NEXT INCREMENT: 0002-notifications
βββββββββββββββββββββββββββββββββββββββββββββββββββ
π Queue: 2 increment(s) remaining
If an increment fails after 3 retry attempts, you can skip it:
/sw:skip-increment
This will:
Use when:
In auto mode, ALL agents MUST follow the auto-execute skill rules:
β FORBIDDEN: "Next Steps: Run wrangler deploy"
β FORBIDDEN: "Execute the schema in Supabase SQL Editor"
β FORBIDDEN: "Set secret via: wrangler secret put..."
β
REQUIRED: Execute commands DIRECTLY using available credentials
Before ANY deployment task, check for credentials:
.env file - Primary credential storagewrangler whoami, gh auth status, etc.wrangler.toml, .specweave/config.json# Example: Supabase migration
if grep -q "DATABASE_URL" .env; then
source .env
psql "$DATABASE_URL" -f schema.sql
fi
# Example: Wrangler deployment
if wrangler whoami 2>/dev/null; then
wrangler deploy
fi
π **Credential Required for Auto-Execution**
I need your Supabase database URL to execute the migration.
**Please paste your DATABASE_URL:**
[I will save to .env and continue automatically]
After user provides credential:
.envSee: plugins/specweave/skills/auto-execute/SKILL.md for full details.
Auto mode uses self-assessment scoring to guide continuation decisions:
After each task/iteration, Claude self-assesses execution quality:
{
"iteration": 5,
"task": "T-003",
"confidence": {
"execution_quality": 0.92, // How well was the task executed?
"test_coverage": 0.85, // Are tests adequate?
"spec_alignment": 0.95, // Does implementation match spec?
"credential_success": 1.0, // Were all deployments successful?
"overall": 0.93 // Weighted average
},
"concerns": [],
"blockers": []
}
| Overall Score | Action |
|---|---|
| β₯ 0.90 | β Continue confidently |
| 0.70-0.89 | β οΈ Continue with caution, log concerns |
| 0.50-0.69 | π‘ Pause for self-review before continuing |
| < 0.50 | π΄ Stop and request human review |
After completing each task, evaluate:
<self-assessment>
Task: T-003 - Implement user authentication
Status: completed
Execution Quality (0.0-1.0): 0.92
- β
All acceptance criteria met
- β
Tests pass
- β οΈ Minor edge case not covered (low impact)
Test Coverage (0.0-1.0): 0.85
- β
Unit tests: 12/12 pass
- β
Integration tests: 5/5 pass
- β οΈ E2E test coverage: 75% (target: 80%)
Spec Alignment (0.0-1.0): 0.95
- β
All ACs addressed
- β
Architecture matches plan.md
Credential Success (0.0-1.0): 1.0
- β
Database migration executed successfully
- β
Secrets deployed to Cloudflare
Overall: 0.93 β CONTINUE
</self-assessment>
The stop hook (plugins/specweave/hooks/stop-auto.sh) reads this scoring:
# Check self-assessment in transcript
SCORE=$(grep -oP 'Overall:\s*\K[0-9.]+' "$TRANSCRIPT_PATH" 2>/dev/null | tail -1)
if [ -n "$SCORE" ] && [ "$(echo "$SCORE < 0.50" | bc)" -eq 1 ]; then
# Score too low, stop for human review
approve "Low confidence score ($SCORE), requesting human review"
fi
Auto mode MUST run tests after completing testable tasks in a self-healing loop:
# Test execution loop (Ralph Loop pattern)
MAX_ATTEMPTS=3
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
ATTEMPT=$((ATTEMPT + 1))
# 1. Run unit/integration tests
npm test 2>&1 | tee test-output.log
UNIT_RESULT=$?
# 2. Run E2E tests if UI exists
if [ -f "playwright.config.ts" ] || [ -f "playwright.config.js" ]; then
npx playwright test --reporter=list 2>&1 | tee e2e-output.log
E2E_RESULT=$?
else
E2E_RESULT=0
fi
# 3. Check results
if [ $UNIT_RESULT -eq 0 ] && [ $E2E_RESULT -eq 0 ]; then
echo "β
All tests passed!"
break
fi
if [ $ATTEMPT -lt $MAX_ATTEMPTS ]; then
echo "π΄ Tests failed (attempt $ATTEMPT/$MAX_ATTEMPTS), analyzing and fixing..."
# AI analyzes failure, fixes code, continues loop
else
echo "β Tests failed after $MAX_ATTEMPTS attempts, stopping for review"
exit 1
fi
done
ALWAYS execute E2E tests for user-facing features:
# Install browsers if needed (first run)
npx playwright install --with-deps chromium
# Run E2E tests
npx playwright test
# On failure, run with trace for debugging
npx playwright test --trace on
# Run specific test file
npx playwright test tests/auth.spec.ts
# Run in headed mode for debugging
npx playwright test --headed
MVP Critical Path Tests (MUST implement):
Every 3-5 tasks, proactively refactor:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REFACTORING TRIGGERS (check after every 3-5 tasks): β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Test file > 200 lines β Split by feature β
β β’ Source file > 300 lines β Extract module β
β β’ Duplicate code 3+ times β Extract utility/helper β
β β’ Same test setup repeated β Extract to fixtures β
β β’ Imports > 15 lines β Consolidate, barrel exports β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Refactoring actions in auto mode:
# After completing task batch, review and refactor:
1. Check test organization β Group by feature
2. Extract shared fixtures β tests/fixtures/
3. Extract utilities β src/utils/ or src/lib/
4. Update imports β Use barrel exports (index.ts)
5. Run tests again β Ensure refactoring didn't break anything
Before moving to next task, verify:
After EVERY task in auto mode, output test status report:
## π§ͺ Test Status Report (after T-003)
| Type | Status | Pass/Total | Coverage |
|------|--------|------------|----------|
| Unit | β
| 42/42 | 87% |
| Integration | β
| 12/12 | - |
| E2E | β οΈ | 8/10 | - |
**Failing tests:**
- `auth.spec.ts:45` - Login redirect not working (fixing now)
**Overall:** 62/64 tests passing (97%)
This report MUST be shown to user after every task completion in auto mode!
If no deployment instructions provided:
Don't assume deployment target! Present options:
π **Ready for Deployment**
All tests pass locally. Where should I deploy?
- Vercel Cron (serverless)
- Railway (always-on)
- GitHub Actions (CI-based)
- Local cron
For scrapers, cron jobs, integrations - ULTRATHINK first:
| Component | Options (by frequency/scale) |
|---|---|
| Cron < 1/hr | Vercel Cron, GitHub Actions, Cloudflare Workers |
| Cron β₯ 1/hr | Railway, Render, dedicated server |
| Heavy compute | Dedicated VM, Docker, Kubernetes |
| Real-time | Always-on server, WebSocket |
| Simple KV | Upstash Redis, Vercel KV |
| Relational DB | Supabase, PlanetScale, Neon |
| File storage | Cloudflare R2, S3, Backblaze B2 |
When implementing scrapers/cron jobs:
CRITICAL: You MUST execute the setup script FIRST before any other action!
When this command is invoked:
Execute this IMMEDIATELY when /sw:auto is invoked:
specweave auto [INCREMENT_IDS...] [OPTIONS]
IMPORTANT: The command is executed via the globally-installed specweave CLI, NOT bash scripts. This ensures cross-platform compatibility (Windows, macOS, Linux).
Pass any arguments from the user (increment IDs, completion conditions, --max-iterations, --simple, etc.)
Handle exit codes:
0: Success, session created β proceed to Step 31: Error (no increments found with --no-increment/--no-inc) β STOP2: Increment creation needed β proceed to Step 2When specweave auto signals increment creation needed:
Check marker file:
cat .specweave/state/auto-needs-increment.json
Analyze context (ULTRATHINK):
.specweave/increments/ for planned/backlog itemsMake intelligent decision:
A. Match existing increment:
# User said: "work on the login feature"
# Found: .specweave/increments/0002-user-login-system (status: planned)
# Action: Activate it and run specweave auto with 0002
/sw:resume 0002
specweave auto 0002 [other-args]
B. Extend existing increment:
# User said: "add password reset to auth"
# Found: .specweave/increments/0001-authentication (status: active, incomplete)
# Action: Add tasks to existing increment, use it for auto mode
# Edit tasks.md to add new tasks
specweave auto 0001 [other-args]
C. Create new increment(s):
# User said: "build a payment integration with Stripe"
# No matching increments found
# Action: Create new increment via /sw:increment
/sw:increment "Payment integration with Stripe - support card payments, webhooks, and subscription management"
# Then run specweave auto with the new increment ID
specweave auto 0003-payment-integration [other-args]
D. Multiple increments:
# User said: "finish all pending features"
# Found: multiple backlog/planned increments
# Action: Create queue
specweave auto 0002-dashboard 0003-reports 0004-export [other-args]
E. Ask user (if ambiguous):
π€ I found several potential matches for your request:
1. **0002-user-authentication** (planned) - Add auth system
2. **0005-oauth-integration** (backlog) - Third-party auth
Which would you like to work on?
- Both (in sequence)
- Just authentication
- Just OAuth
- Something else (please describe)
Clean up marker:
rm -f .specweave/state/auto-needs-increment.json
Proceed to Step 3 with increment(s) resolved
Verify session was created:
cat .specweave/state/auto-session.json | jq -r '.sessionId'
If file doesn't exist, the setup failed - investigate and fix before continuing.
**Start execution:
Now starting autonomous execution...
Session: auto-2025-12-29-abc123
Increment: 0001-user-auth
Tasks: 12 pending
The stop hook will keep me working until all tasks are complete
or you run /sw:cancel-auto.
Beginning with T-001...
Execute /sw:do in a loop (stop hook handles continuation):
On completion:
<auto-complete>DONE</auto-complete>
β
Auto Session Complete!
Session: auto-2025-12-29-abc123
Duration: 2h 34m
Iterations: 47
Tasks Completed: 42/42
Tests Passed: 156/156
Coverage: 87%
Summary saved to: .specweave/logs/auto-2025-12-29-abc123-summary.md
| Command | Purpose |
|---|---|
/sw:auto-status | Check session status |
/sw:cancel-auto | Cancel session |
/sw:skip-increment | Skip failed increment and continue queue |
/sw:do | Execute tasks (also works standalone) |
/sw:progress | Show increment progress |