User request: $ARGUMENTS

Build implementation plan through structured discovery. Takes spec (from /spec or inline), iteratively researches codebase + asks high-priority technical questions that shape implementation direction → detailed plan.

Focus: HOW not WHAT. Spec=what; plan=architecture, files, functions, chunks, dependencies, tests.

Loop: Research → Expand todos → Ask questions → Write findings → Repeat until complete

Output files:

Plan: /tmp/plan-{YYYYMMDD-HHMMSS}-{name-kebab-case}.md
Research log: /tmp/plan-research-{YYYYMMDD-HHMMSS}-{name-kebab-case}.md (external memory)

Boundaries

Spec=requirements; this skill=architecture, files, chunks, tests
Don't modify spec; flag gaps for user
Surface infeasibility before proceeding
No implementation until approved

Phase 1: Initial Setup

1.1 Create todos (TodoWrite immediately)

Todos = areas to research/decide, not steps. Expand when research reveals: (a) files/modules to modify beyond those already in todos, (b) 2+ valid implementation patterns with different trade-offs, (c) dependencies on code/systems not yet analyzed, or (d) questions that must be answered before completing an existing todo.

Starter seeds:

- [ ] Read/infer spec requirements
- [ ] Codebase research (patterns, files to modify)
- [ ] Architecture decisions
- [ ] (expand as research reveals new areas)
- [ ] Finalize chunks

Evolution example - "Add real-time notifications":

Initial → After codebase research (found WebSocket) → After "needs offline too":

- [x] Read spec → 3 types, mobile+web
- [x] Codebase research → ws.ts, notification-service.ts
- [x] WebSocket approach → extend existing
- [ ] Architecture decisions
- [ ] Offline storage (IndexedDB vs localStorage)
- [ ] Sync conflict resolution
- [ ] Service worker integration
- [ ] Finalize chunks

Key: Never prune todos prematurely.

1.2 Create research log

Path: /tmp/plan-research-{YYYYMMDD-HHMMSS}-{name-kebab-case}.md

# Research Log: {feature}
Started: {timestamp} | Spec: {path or "inline"}

## Codebase Research
## Architecture Decisions
## Questions & Answers
## Unresolved Items

Phase 2: Context Gathering

Prerequisites: Requires vibe-workflow:codebase-explorer agent. If Task tool fails for any reason (agent not found, timeout after 120 seconds, permission error, incomplete results) OR returns fewer than 3 relevant files when exploring an area expected to touch multiple modules (cross-cutting concerns, features spanning >2 directories), perform supplementary codebase research manually using Read, Glob, and Grep tools and note [SUPPLEMENTED RESEARCH: codebase-explorer insufficient - {reason}] in research log. Do not retry on timeout—proceed directly to supplementary research.

2.1 Read/infer spec

Extract: requirements, user stories, acceptance criteria, constraints, out-of-scope.

No formal spec? Infer from conversation, tool outputs, user request. If spec and conversation together provide fewer than 2 concrete requirements, ask user via AskUserQuestion: "I need at least 2 concrete requirements to plan. Please provide: [list what's missing]" before proceeding.

2.2 Launch codebase-explorer

Task tool with subagent_type: "vibe-workflow:codebase-explorer". Launch multiple in parallel for cross-cutting work.

Explore: existing implementations, files to modify, patterns, integration points, test patterns.

2.3 Read ALL recommended files

No skipping. Gives firsthand knowledge of patterns, architecture, integration, tests.

2.4 Update research log

After EACH step:

### {timestamp} - {what researched}
- Explored: {areas}
- Key findings: {files, patterns, integration points}
- New areas: {list}
- Architectural questions: {list}

2.5 Write initial draft

First draft with [TBD] markers. Same file path for all updates.

Phase 3: Iterative Discovery Interview

CRITICAL: Use AskUserQuestion tool for ALL questions—never plain text. If AskUserQuestion is unavailable, present questions in structured markdown with numbered options and wait for user response.

Example (the questions array supports 1-4 questions per call—that's batching):

questions: [
  {
    question: "Should we build the full implementation or a minimal stub?",
    header: "Phasing",
    options: [
      { label: "Full implementation (Recommended)", description: "Complete feature per spec, production-ready" },
      { label: "Minimal stub", description: "Interface only, implementation deferred" },
      { label: "Incremental", description: "Core first, enhance in follow-up PRs" }
    ],
    multiSelect: false
  },
  {
    question: "Which state management approach?",
    header: "State",
    options: [
      { label: "Extend existing store (Recommended)", description: "Matches codebase pattern in src/store/" },
      { label: "Local component state", description: "Simpler but less shareable" },
      { label: "New dedicated store", description: "Isolated but adds complexity" }
    ],
    multiSelect: false
  }
]

Memento Loop

Mark todo in_progress (via TodoWrite with status "in_progress")
Research (codebase-explorer) OR ask (AskUserQuestion)
Write findings immediately to research log
Expand todos for new questions/integration points/dependencies
Update plan (replace [TBD])
Mark todo completed (via TodoWrite with status "completed")
Repeat until no pending todos

NEVER proceed without writing findings — research log = external memory.

If user answer contradicts prior decisions: (1) Inform user: "This contradicts earlier decision X. Proceeding with new answer." (2) Log in research log under ## Conflicts with both decisions. (3) Re-evaluate affected todos. (4) Update plan accordingly. If contradiction cannot be resolved, ask user to clarify priority.

Research Log Update Format

### {timestamp} - {what}
**Todo**: {which}
**Finding/Answer**: {result}
**Impact**: {what revealed/decided}
**New areas**: {list or "none"}

Architecture decisions:

- {Area}: {choice} — {rationale}

Todo Expansion Triggers

Research Reveals	Add Todos For
Existing similar code	Integration approach
Multiple valid patterns	Pattern selection
External dependency	Dependency strategy
Complex state	State architecture
Cross-cutting concern	Concern isolation
Performance-sensitive	Performance strategy
Migration needed	Migration path

Interview Rules

Unbounded loop: Iterate until ALL completion criteria met. No fixed round limit. If user says "just decide", "you pick", "I don't care", "skip this", or otherwise explicitly delegates the decision, document remaining decisions with [INFERRED: {choice} - {rationale}] markers and finalize.

Spec-first: Business scope and requirements belong in spec. Questions here are TECHNICAL only—architecture, patterns, implementation approach. If spec has gaps affecting implementation: (1) flag in research log under ## Spec Gaps, (2) ask user via AskUserQuestion whether to pause for spec update OR proceed with stated assumption, (3) document choice and continue.

Prioritize questions that eliminate other questions - Ask questions where the answer changes what other questions you need to ask, or eliminates entire branches of implementation. If knowing X makes Y irrelevant, ask X first.
Interleave discovery and questions:
- User answer reveals new area → launch codebase-explorer
- Need external context → launch web-researcher (if unavailable, ask user to provide external context directly via AskUserQuestion)
- Update plan after each iteration, replacing [TBD] markers

Question priority order:

Priority	Type	Purpose	Examples
1	Implementation Phasing	How much to build now vs later	Full impl vs stub? Include migration? Optimize or simple first?
2	Branching	Open/close implementation paths	Sync vs async? Polling vs push? In-memory vs persistent?
3	Technical Constraints	Non-negotiable technical limits	Must integrate with X? Performance requirements? Backward compatibility?
4	Architectural	Choose between patterns	Error strategy? State management? Concurrency model?
5	Detail Refinement	Fine-grained technical details	Test coverage scope? Retry policy? Logging verbosity?

Always mark one option "(Recommended)" - put first with reasoning in description. When options are equivalent AND easily reversible (changes affect only 1-2 files, where each changed file is imported by 5 or fewer other files, and there are no data migrations, schema changes, or public API changes), decide yourself (lean toward existing codebase patterns).
Be thorough via technique:
- Cover technical decisions from each applicable priority category (1-5 in the priority table)—don't skip categories to save time
- Reduce cognitive load through HOW you ask: concrete options, good defaults
- Batching: Up to 4 questions in questions array per call (batch questions that share a common decision—e.g., multiple state management questions, or multiple error handling questions—where answers to one inform the others); max 4 options per question (tool limit)
- Make decisions yourself when codebase research suffices
- Complete plan with easy questions > incomplete plan with fewer questions
Ask non-obvious questions - Error handling strategies, edge cases affecting correctness, performance implications, testing approach for complex logic, rollback/migration needs, failure modes

Ask vs Decide - Codebase patterns and technical standards are authority; user decides significant trade-offs.

Ask user when:

Category	Examples
Trade-offs affecting measurable outcomes	Estimated >20% change to latency/throughput vs current implementation, adds abstraction layers, locks approach for >6 months, changes user-facing behavior
No clear codebase precedent	New pattern not yet established
Multiple valid approaches	Architecture choice with different implications
Phasing decisions	Full impl vs stub, migration included or deferred
Breaking changes	API changes, schema migrations
Resource allocation	Cache size, connection pools, batch sizes with cost implications

Decide yourself when:

Category	Examples
Existing codebase pattern	Error format, naming conventions, file structure
Industry standard	HTTP status codes, retry with exponential backoff
Sensible defaults	Timeout 30s, pagination 50 items, debounce 300ms
Easily changed later	Internal function names, log messages, test structure
Implementation detail	Which hook to use, internal state shape, helper organization
Clear best practice	Dependency injection, separation of concerns

Test: "If I picked wrong, would user say 'that's not what I meant' (ASK) or 'that works, I would have done similar' (DECIDE)?"

Phase 4: Finalize & Present

4.1 Final research log update

## Planning Complete
Finished: {timestamp} | Research log entries: {count} | Architecture decisions: {count}
## Summary
{Key decisions}

4.2 Finalize plan

Remove [TBD], ensure chunk consistency, verify dependency ordering, add line ranges for files >500 lines.

4.3 Mark all todos complete

4.4 Present summary

## Plan Summary

**Plan file**: /tmp/plan-{...}.md

### What We're Building
{1-2 sentences}

### Chunks ({count})
1. {Name} - {description}

### Key Decisions
- {Decision}: {choice}

### Execution Order
{Dependencies, parallel opportunities}

---
Review full plan. Adjust or approve to start.

4.5 Wait for approval

Do NOT implement until user explicitly approves. After approval: create todos from chunks, execute.

Planning Methodology

1. Principles

Principle	Description
Safety	Never skip gates (type checks, tests, lint); every chunk tests+demos independently
Clarity	Full paths, numbered chunks, rationale for context files, line ranges
Minimalism	Ship today's requirements; parallelize where possible
Forward focus	Don't prioritize backward compatibility unless requested or public API/schema contracts would be broken
Cognitive load	Deep modules with simple interfaces > many shallow; reduce choices
Conflicts	Safety > Clarity > Minimalism > Forward focus

Definitions:

Gates: Quality checks every chunk must pass—type checks (0 errors), tests (pass), lint (clean)
Mini-PR: A chunk sized to be its own small pull request—complete, mergeable, reviewable independently
Deep modules: Modules that hide complexity behind simple interfaces (few public methods, rich internal logic)

Code Quality (P1-P10)

User's explicit intent takes precedence for implementation choices (P2-P10). P1 (Correctness) and Safety gates (type checks 0 errors, tests pass, lint clean) are non-negotiable—if user requests skipping these, flag as risk but do not skip.

#	Principle	Planning Implication
P1	Correctness	Every chunk must demonstrably work
P2	Observability	Plan logging, error visibility
P3	Illegal States Unrepresentable	Design types preventing compile-time bugs
P4	Single Responsibility	Each chunk ONE thing
P5	Explicit Over Implicit	Clear APIs, no hidden behaviors
P6	Minimal Surface Area	YAGNI—don't add features beyond spec
P7	Tests	Specific cases, not "add tests"
P8	Safe Evolution	Public API/schema changes need migration
P9	Fault Containment	Plan failure isolation, retry/fallback
P10	Comments Why	Document complex logic why, not what

P1-P10 apply to code quality within chunks. Principle conflicts (Safety > Clarity > Minimalism > Forward focus) govern planning-level decisions. When both apply, Safety (gates) takes precedence over all P2-P10.

Values: Mini-PR > monolithic; parallel > sequential; function-level > code details; dependency clarity > implicit coupling; ship-ready > half-built

2. Mini-PR Chunks

Each chunk must:

Ship complete value (demo independently)
Pass all gates (type checks, tests, lint)
Be mergeable alone (1-3 functions, <200 lines of code)
Include its tests (name specific inputs/scenarios, e.g., "valid email accepts user@domain.com", "invalid rejects missing @")

3. Chunk Sizing

Complexity	Chunks	Guidance
Simple	1-2	1-3 functions each
Medium	3-5	<200 lines of code per chunk
Complex	5-8	Each demo-able
Integration	+1 final	Connect prior work

Decision guide: New model/schema → types chunk first | >3 files or >5 functions → split by concern | Complex integration → foundation then integration | One module <200 lines of code → single chunk OK

4. Dependency Ordering

True dependencies: uses types, calls functions, extends
False dependencies: same feature, no interaction (parallelize these)
Minimize chains: A→B and A→C, then B,C→D (not A→B→C→D)
Circular dependencies: If chunks form a cycle (A needs B, B needs C, C needs A), extract shared interfaces/types into a new foundation chunk that breaks the cycle
Number chunks; mark parallel opportunities

5. What Belongs

Belongs	Does Not Belong
Numbered chunks, gates, todo descriptions	Code snippets
File manifests with reasons	Extra features, future-proofing
Function names only	Performance tuning, assumed knowledge

6. Cognitive Load

Deep modules first: fewer with simple interfaces, hide complexity
Minimize indirection: layers only for concrete extension
Composition root: one wiring point
Decide late: abstraction only when PR needs extension
Framework at edges: core logic agnostic, thin adapters
Reduce choices: one idiomatic approach per concern
Measure: if understanding the chunk's purpose requires reading more than 3 files or tracing more than 5 function calls, simplify it

7. Common Patterns

Pattern	Flow
Sequential	Model → Logic → API → Error handling
Parallel after foundation	Model → CRUD ops (parallel) → Integration
Pipeline	Types → Parse/Transform (parallel) → Format → Errors
Authentication	User model → Login → Auth middleware → Logout
Search	Data structure → Algorithm → API → Ranking

8. Plan Template

# IMPLEMENTATION PLAN: [Feature]

[1-2 sentences]

Gates: Type checks (0 errors), Tests (pass), Lint (clean)

---

## Requirement Coverage
- [Spec requirement] → Chunk N
- [Spec requirement] → Chunk M, Chunk N

---

## 1. [Name]

Depends on: - | Parallel: -

[What this delivers]

Files to modify:
- path.ts - [changes]

Files to create:
- new.ts - [purpose]

Context files:
- reference.ts - [why relevant]

Notes: [Assumptions, risks, alternatives]

Tasks:
- Implement fn() - [purpose]
- Tests - [cases]
- Run gates

Acceptance criteria:
- Gates pass
- [Specific verifiable criterion]

Key functions: fn(), helper()
Types: TypeName

Good Example

## 2. Add User Validation Service

Depends on: 1 (User types) | Parallel: 3

Implements email/password validation with rate limiting.

Files to modify:
- src/services/user.ts - Add validateUserInput()

Files to create:
- src/services/validation.ts - Validation + rate limiter

Context:
- src/services/auth.ts:45-80 - Existing validation patterns
- src/types/user.ts - User types from chunk 1

Tasks:
- validateEmail() - RFC 5322
- validatePassword() - Min 8, 1 number, 1 special
- rateLimit() - 5 attempts/min/IP
- Tests: valid email, invalid formats, password edges, rate limit
- Run gates

Acceptance criteria:
- Gates pass
- validateEmail() rejects invalid formats, accepts valid RFC 5322
- validatePassword() enforces min 8, 1 number, 1 special
- Rate limiter blocks after 5 attempts/min/IP

Functions: validateUserInput(), validateEmail(), rateLimit()
Types: ValidationResult, RateLimitConfig

Bad Example

## 2. User Stuff
Add validation for users.
Files: user.ts
Tasks: Add validation, Add tests

Why bad: No dependencies, vague description, missing full paths, no context files, generic tasks, no functions listed, no acceptance criteria.

9. File Manifest & Context

Every file to modify/create; specify changes and purpose
Full paths; zero prior knowledge assumed
Context files: explain WHY; line ranges for files >500 lines

10. Quality Criteria

Level	Criteria
Good	Each chunk ships value; dependencies ordered; parallel identified; files explicit; context has reasons; tests in todos; gates listed
Excellent	+ optimal parallelization, line numbers, clear integration, risks, alternatives, reduces cognitive load

Quality Checklist

MUST verify:

Correctness: boundaries, null/empty, error paths
Type Safety: types prevent invalid states; validation at boundaries
Tests: critical + error + boundary paths

SHOULD verify:

Observability: errors logged with context
Resilience: timeouts, retries with backoff, cleanup
Clarity: descriptive names, no magic values
Modularity: single responsibility, <200 lines of code, minimal coupling
Evolution: public API/schema changes have migration

Test Priority

Priority	What	Requirement
9-10	Data mutations, money, auth, state machines	MUST
7-8	Business logic, API contracts, errors	SHOULD
5-6	Edge cases, boundaries, integration	GOOD
1-4	Trivial getters, pass-through	OPTIONAL

Error Handling

For external systems/user input, specify:

What can fail
How failures surface
Recovery strategy

Avoid: empty catch, catch-return-null, silent fallbacks, broad catching.

11. Problem Scenarios

Scenario	Action
No detailed requirements	Research → core requirements/constraints unclear: ask via AskUserQuestion OR stop → non-critical: assume+document
Extensive requirements	MUSTs first → research scope → ask priority trade-offs → defer SHOULD/MAY
Multiple approaches	Research first → ask only when significantly different implications
Everything dependent	Start from types → question each dependency → find false dependencies → foundation → parallel → integration

Planning Mantras

Memento (always):

Write findings BEFORE next step (research log = external memory)
Every discovery needing follow-up → todo
Update research log after EACH step

Primary: 4. Smallest shippable increment? 5. Passes all gates? 6. Explicitly required? 7. Passes review first submission?

Secondary: 8. Ship with less? 9. Dependencies determine order? 10. Researched first, asked strategically? 11. Reduces cognitive load? 12. Satisfies P1-P10? 13. Error paths planned?

Never Do

Proceed without writing findings
Keep discoveries as mental notes
Skip todos
Write to project directories (always /tmp/)
Ask scope/requirements questions (that's spec phase)
Finalize with [TBD]
Implement without approval
Forget expanding todos on new areas

Recognize & Adjust

Symptom	Action
Chunk >200 lines of code	Split by concern
No clear value	Merge or refocus
Dependencies unclear	Make explicit, number
Context missing	Add files + line numbers

/plan