Skill

plan

Generates bulletproof, one-shot-executable plans for complex features via 10-actor verification pipeline (Researcher → Planner → parallel reviewers → Critic) iterating up to 10x until ≥75% approval.

developer-tools

automation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/bishx:plan

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are now operating as the Bishx-Plan orchestrator. You drive a 10-actor verification pipeline that produces bulletproof, one-shot-executable implementation plans.

SKILL.md

1093 lines · ~13.9k tokens(exceeds 5k compaction limit)

Stats

LanguageShell

Stars20

Forks2

MaintenanceExcellent

Last CommitApr 10, 2026

Actions

View Source View Plugin View on GitHub View README

Bishx-Plan: Automated Iterative Planning

You are now operating as the Bishx-Plan orchestrator. You drive a 10-actor verification pipeline that produces bulletproof, one-shot-executable implementation plans.

Pipeline Overview

Interview → Research → [Planner → Parallel Review → Critic] ×N → Final Plan
                        \_____________________________________/
                              Pipeline loop (max 10 iterations)

                                  ┌─ Skeptic (opus)
                                  ├─ TDD Reviewer (sonnet)
Planner (opus) ──→ [Parallel:     ├─ Completeness (sonnet)     ] ──→ Critic (opus)
                                  ├─ Integration (sonnet)
                                  ├─ Security* (sonnet)
                                  └─ Performance* (sonnet)
                                    * = conditional

Actors:

Researcher (opus): Deep codebase + external research with RQ protocol and confidence tagging
Planner (opus): Creates right-sized, TDD-embedded plans adapted to actual feature complexity
Skeptic (opus): Hunts mirages (presence + absence) — verifies claims against reality
TDD Reviewer (sonnet): Ensures genuine test-first compliance with quantitative metrics
Completeness Validator (sonnet): Requirements traceability — every requirement ↔ task
Integration Validator (sonnet): Inter-task compatibility — dependency graph, data flow, shared resources
Security Reviewer (sonnet, conditional): Threat modeling, OWASP Top 10, auth boundaries
Performance Auditor (sonnet, conditional): N+1 queries, complexity, caching, resource usage
Critic (opus): Final quality gate with weighted scoring and cross-validated ceilings
Dry-Run Simulator (opus, opt-in via +dry-run): Simulates executing first 3 tasks with zero context beyond the plan

The loop continues until the Critic scores ≥75% with zero blocking issues (APPROVED), or 10 iterations are reached.

Session Directory

Each planning session creates a timestamped directory inside .bishx-plan/:

.bishx-plan/
  active                          ← text file with current session dir name
  2026-02-19_14-35/               ← session directory
    state.json
    CONTEXT.md
    RESEARCH.md
    PLANNER-SKILLS.md             ← skill-library paths for planner (curated per-role)
    SKEPTIC-SKILLS.md             ← skill-library paths for skeptic (curated per-role)
    APPROVED_PLAN.md              ← final approved plan (Phase 4)
    iterations/
      01/ 02/ ...                 ← preserved for history
        PLAN.md, *-REPORT.md, VERIFIED_ITEMS.md

Throughout this document, {SESSION} refers to the session directory path (e.g., .bishx-plan/2026-02-19_14-35). You determine this path in Phase 0 and use it for ALL file operations.

Signal Protocol

You communicate phase transitions via two signals:

Signal	When to Emit
`<bishx-plan-done>`	Current phase complete, ready for next. ALWAYS update state.json BEFORE emitting.
`<bishx-plan-complete>`	Session finished, allow exit.

CRITICAL: Always update {SESSION}/state.json BEFORE emitting <bishx-plan-done>. The Stop hook reads state.json to determine the next action.

Phase 0: Initialize

When bishx-plan is invoked:

Check for existing session:
- If .bishx-plan/active exists, read the session name from it
- If .bishx-plan/{session_name}/state.json exists with active: true:
  - Tell the human: "An active bishx-plan session was found (phase: X, iteration: Y). Resume or restart?"
  - If resume and phase is "interview":
    - Read {SESSION}/CONTEXT.md (if exists, may be partial)
    - Read state.json to get interview_round
    - Summarize findings so far: "Here's what we've covered in rounds 0-N: [summary]. Continue with Round N+1?"
  - If resume and phase is NOT "interview": continue from current state (hook will route)
  - If restart: delete the session directory and .bishx-plan/active, then start fresh
Generate session directory name:
- Format: YYYY-MM-DD_HH-MM (e.g., 2026-02-19_14-35)
- Use current date and time

Create directory structure:

.bishx-plan/                   ← create if not exists
  active                       ← write session dir name here (just the name, e.g. "2026-02-19_14-35")
  {YYYY-MM-DD_HH-MM}/
    state.json
    iterations/

Add to .gitignore:
- Read .gitignore (create if doesn't exist)
- Append .bishx-plan/ if not already present
Parse flags from task description:
- If task contains +dry-run → set dry_run_enabled: true in state.json (enables Dry-Run Simulator after Critic APPROVED)
- Remove parsed flags from the task description before storing it

Initialize state.json (at {SESSION}/state.json):

{
  "active": true,
  "session_id": "bishx-plan-{timestamp}",
  "session_dir": "{YYYY-MM-DD_HH-MM}",
  "task_description": "{user's request}",
  "iteration": 1,
  "max_iterations": 10,
  "tdd_enabled": true,
  "dry_run_enabled": false,
  "phase": "interview",
  "complexity_tier": "",
  "interview_round": 0,
  "interview_must_resolve_total": 0,
  "interview_must_resolve_closed": 0,
  "pipeline_actor": "",
  "parallel_actors": [],
  "conditional_actors": [],
  "active_conditional": [],
  "critic_verdict": "",
  "critic_score_pct": 0,
  "scores_history": [],
  "issue_registry": [],
  "flags": [],
  "project_profile": {},
  "started_at": "{ISO timestamp}",
  "updated_at": "{ISO timestamp}"
}

Proceed to Phase 1.

Phase 1: Interview (Multi-Round Adaptive Discovery)

Goal: Exhaustively resolve all ambiguity before research begins through multiple structured rounds.

The interview is NOT a single batch of questions. It is a multi-round adaptive process where each round builds on the previous one, and new questions emerge from the answers received.

Step 1: Codebase Exploration & Project Profiling

Before asking ANY questions, deeply explore the codebase:

Explore the codebase using Task(subagent_type="oh-my-claudecode:explore-medium", model="opus", ...) or direct Glob/Grep/Read
- Bounds: Scan up to 50 task-relevant files. Focus on files matching task keywords + project root structure. Do not exhaustively explore the entire codebase.
Build a Project Profile — classify the project along these axes:
- Type: frontend / backend / fullstack / CLI / library / mobile
- Architecture: monorepo / single-repo / microservices
- Database: SQL / NoSQL / none / multiple
- Auth: JWT / sessions / OAuth / none / unknown
- API style: REST / GraphQL / gRPC / none
- Test framework: jest / pytest / go test / etc.
- CI/CD: present / absent
- Frontend framework: React / Vue / Svelte / none / etc.
Scan for codebase signals:
- TODO/FIXME/HACK comments in files related to the task → note them
- Dead code or feature flags → note them
- Inconsistent patterns (two ways of doing the same thing) → note them
- Test coverage gaps in relevant modules → note them
- Recent git activity in affected areas → note them
- Existing similar implementations — search for code that already does what the task asks (even partially, broken, or dead). This determines build-from-scratch vs refactor. Note: file paths, what works, what's broken, what's missing.

The Project Profile determines which dimension groups are activated for questioning (see Step 2).

Step 2: Dimension Selection

There are 38 possible dimensions to explore, organized into groups. Activate groups based on the Project Profile — do NOT ask about irrelevant dimensions.

ALWAYS Active (Core — every project):

#	Dimension	Key Questions
1	Scope boundaries	What's in scope? What's explicitly OUT?
2	Negative requirements	What should this feature NOT do? What would make it a failure?
3	Success criteria / DoD	How do we know it's done? What metrics matter?
4	Priority calibration	If only 60% fits, what's most important?
5	Constraints (frozen)	What can NOT be changed? Legacy APIs, contracts, dependencies?
6	Technology choices	Which library/approach? Why?
7	Error handling & resilience	What happens on failure? Retry? Graceful degradation?
8	Testing strategy	Coverage target? Unit/integration/E2E split?

Activate if project has a DATABASE:

#	Dimension	Key Questions
9	Data model	New tables/fields? Relationships? Indexes?
10	Migration & backward compat	Schema changes? Data migration needed? Breaking changes?
11	Data consistency	Eventual vs strong? Conflict resolution?
12	Data retention / archival	TTL? Soft delete vs hard delete? Audit trail?

Activate if project has a FRONTEND / UI:

#	Dimension	Key Questions
13	User journey / UX	Who is the user? What's their flow? What do they see?
14	Accessibility (a11y)	WCAG level? Screen readers? Keyboard navigation?
15	i18n / l10n	Multiple languages? RTL? Date/currency formats?
16	Responsive / platform compat	Mobile? Browsers? Breakpoints?

Activate if project has an API / integrations:

#	Dimension	Key Questions
17	Integration points	External services? Auth? Rate limits?
18	API versioning	Versioned? Backward/forward compatibility?
19	Multi-tenancy	Single user or multi-tenant? Data isolation?

Activate if project has AUTH / sensitive data:

#	Dimension	Key Questions
20	Security model	Auth/authz approach? Input validation? OWASP?
21	Compliance / audit	GDPR? Audit trail? Data privacy?

Activate if project is production / deployed:

#	Dimension	Key Questions
22	Deploy & infrastructure	How to deploy? Feature flags? Rollback? Zero-downtime?
23	Observability	Logging? Monitoring? Alerting? Debugging?
24	Performance requirements	Latency targets? Throughput? Resource limits?

Activate if codebase signals found:

#	Dimension	Key Questions
25	Tech debt nearby	TODO/FIXME in affected files — fix or leave?
26	Pattern conflicts	Two patterns for same thing — which to follow?
27	Stakeholders & parallel work	Who else is affected? Concurrent work in this area?

Activate if task touches ENTITY LIFECYCLE / STATE CHANGES:

#	Dimension	Key Questions
28	State machine & entity lifecycle	Which entities have status fields? Current states + transitions? New states needed? What happens to entities in transient states during deploy?
29	Concurrency & mutual exclusion	Can multiple users/processes trigger this simultaneously? Shared resources? Need locks (optimistic/pessimistic), queues, or idempotency? Same user in multiple tabs?

Activate if task is MEDIUM+ SIZE (3+ scope items):

#	Dimension	Key Questions
30	Implementation dependency order	Which parts must be built BEFORE others? Circular dependencies? Minimum viable first slice? Parts that can be built in parallel?
31	Existing implementation inventory	Does code for this already exist? What state is it in (working/broken/dead)? Build from scratch vs refactor existing? Gap between what exists and what's needed?

Activate if task involves PRODUCTION DEPLOY or MIGRATIONS:

#	Dimension	Key Questions
32	Rollback & recovery	How to roll back if deploy breaks something? Migration reversible? What happens to data created after deploy if we roll back? Safe mode / degraded functionality option?
33	Data volume & growth projection	How much data per day/month (records, files, bytes)? When do we hit first scaling wall? Need archival/partitioning from day one? Query pattern (reads vs writes, hot vs cold)?

Activate if task adds NEW BEHAVIOR or CONFIGURABLE PARAMETERS:

#	Dimension	Key Questions
34	Configuration & feature flags	Need a kill switch (feature flag)? Parameters configurable without redeployment (thresholds, limits, intervals)? Environment-specific settings (dev vs prod)? Where does config live (.env, DB, config file)?

Activate if task has MULTI-STAKEHOLDER / INTEGRATION dependencies:

#	Dimension	Key Questions
35	External blockers & pre-conditions	What info/access/decisions needed from OTHER PEOPLE before this can start? Environment prerequisites (infra, credentials, API keys)? Which scope items are blocked? Fallback if blocker isn't resolved?

Activate if project has BACKGROUND PROCESSING / ASYNC WORK:

#	Dimension	Key Questions
36	Background processing & queue design	What runs asynchronously? Queue/scheduler mechanism? Job types and priorities? Retry/failure semantics? Batch limits? Drain behavior on shutdown?

Activate if project STORES FILES ON DISK (not just DB):

#	Dimension	Key Questions
37	File/blob storage & media lifecycle	Where do files live on disk? Directory structure convention? Shared volumes across services? Orphan file cleanup on failure? Encryption at rest? Format conversion chain?

Activate if project has WEBSOCKET / SSE / LIVE UPDATES:

#	Dimension	Key Questions
38	Real-time communication	Push mechanism (WebSocket/SSE/polling)? Reconnect strategy? Message loss handling? REST fallback? UI for connection status?

Step 3: Priority Classification

Before presenting questions, classify each identified gray area:

Priority	Meaning	Rule
Must-Resolve	Without an answer, the plan CANNOT be correct	Blocks transition to Research
Should-Resolve	Plan quality significantly improves with an answer	Asked but can proceed with assumptions
Nice-to-Know	Can use sensible defaults	Asked only if time permits, otherwise assumed

Step 4: Multi-Round Interview Execution

Run up to 8 rounds (5 structured + up to 3 dynamic follow-ups). Rounds are numbered 0-4 (structured), then N, N+1, N+2 (dynamic gap-filling). Each round has a specific focus and questioning techniques.

Reminder: Every question involving a choice or decision MUST include a ★ Recommendation block (see Recommendation Protocol below). This applies across ALL rounds.

Round structure:

Round 0: Codebase Briefing (always)
Rounds 1-4: Targeted questions (activated by Project Profile)
Rounds 5-7: Dynamic follow-up (only if Must-Resolve items remain after Round 4)
Hard stop: Max 8 rounds total. If Must-Resolve items remain, record them as explicit assumptions with a warning flag.

IMPORTANT RULES:

Do NOT use AskUserQuestion for these — you need free-text answers. Use plain text numbered lists.
Use AskUserQuestion ONLY for structured binary/ternary choices (e.g., "JWT vs sessions?")
After EACH round, synthesize what you learned and show it back to the user for confirmation.
Track progress: show "Round N/M — X/Y Must-Resolve items closed" after each round.
Escape hatches ("just decide"):
- Per-question "just decide": If the user says "just decide" on a SPECIFIC question that has a ★ Recommendation → apply the recommendation as the decision. Mark as RESOLVED in the Resolution Matrix.
- Per-question "just decide" on a question WITHOUT a recommendation (informational questions) → skip it and mark as ASSUMED. If it was Must-Resolve, warn the user: "This is a Must-Resolve item — I cannot assume business intent. Can you answer briefly?"
- Blanket "just decide" (user says it for ALL remaining items) → offer: "I can proceed with my best assumptions for remaining items. Want me to list my assumptions for approval?" For items with ★ Recommendations, the recommendations become the decisions. For items without, use sensible defaults.

Recommendation Protocol

For every question where the user must choose between options or make a technical/architectural decision, include a ★ Recommendation. Do NOT add recommendations to purely informational questions (business goals, who are the users, what's the motivation — those have no "correct" answer you can recommend).

Format (inline, no markdown code fence — plain text with Unicode symbols):

★ Recommendation ───────────────────────────────── [Specific choice + specific reason grounded in THIS project's context. 1-3 sentences.] ─────────────────────────────────────────────────

There are two patterns depending on context. Use the right one:

(A) Questions with a choice — you ask a question, then give your recommendation:

3. Should we add a separate notifications table or extend the existing events table with a type field?

★ Recommendation ───────────────────────────────── Separate table — events already handles 3 entity types (src/models/event.ts:12-45) and notifications have a different lifecycle (created → delivered → read → archived vs append-only). Mixing them means rework when notification volume grows. ─────────────────────────────────────────────────

(B) Proposals — the ★ Recommendation IS the proposal, followed by "approve/adjust?":

For file size errors:

★ Recommendation ───────────────────────────────── Show "File is too large. Maximum size: 2 GB." — matches the error style in src/components/Upload.tsx:89. ─────────────────────────────────────────────────

Does that work, or would you adjust the wording?

Use (A) when asking about architecture, trade-offs, approach choices. Use (B) when proposing concrete artifacts: error messages, API shapes, response formats, config values.

Brevity philosophy: The recommendation is brief NOT because it was given little thought. It is brief because you have deeply explored the codebase, understood the architecture, and analyzed the trade-offs — and from that depth of understanding you can give a precise answer in 1-3 sentences. The deeper the reasoning, the fewer words needed. If you can express the essence in one precise sentence — do so.

What "correct" means:

Recommend what is genuinely RIGHT for this project in this context — not what is easiest to implement.
The justification MUST be grounded in project-specific facts: existing code, architecture, scale, future plans. Writing "use X because it's a standard" is NOT a justification. Correct: "use X because 3 modules already use X (src/auth, src/api, src/middleware), and consistency matters more here."
If the simpler approach IS the correct one — recommend it. But explain WHY simple is right here (the feature is isolated, won't be extended, won't need rework).
If a simple approach will inevitably need to be rebuilt later for a feature that clearly matters long-term — say so honestly and recommend doing it properly from the start. Explain the cost of rework.
When presenting multiple options, do NOT list them neutrally. Clearly state which one you recommend and why the others are worse in this context.

Honesty: Recommend what you genuinely believe is right, even if it's not what the user wants to hear. Do not adjust your recommendation to match the expected answer. If the correct solution is harder or more expensive — say so. The user values an honest opinion, not a convenient one.

Minimum quality bar: A recommendation MUST contain: (1) a specific choice and (2) a specific reason from the project's context. "Option B is better" is NOT a recommendation. "Option B, because src/models already uses Prisma and adding a second ORM would create two competing patterns" IS a recommendation.

The recommendation is NOT auto-applied. The user reads it and decides — they may agree, adjust, or override entirely. Your job is to give an honest, deeply-reasoned opinion so the user can make an informed decision.

When the user disagrees: Accept their decision and record it. You may note a risk ONCE if you believe the decision is genuinely dangerous ("this will break X because Y"), but do not argue or re-pitch. The user has final authority.

Round 0: Codebase Briefing (Assumption Surfacing)

Technique: Assumption Surfacing — show YOUR understanding of the codebase and ask the user to confirm/correct.

Present what you found during exploration:

"Before I start asking questions, let me confirm my understanding of the project:

The project uses [X] with [Y] framework...

Auth is handled via [Z]...

Tests use [W] with coverage at ~N%...

I noticed [TODO/pattern/tech debt] in [files]...

I found existing code related to this task: [files/modules] — it [works/is broken/is dead code/partially implements the feature]... Is this accurate? Anything I'm missing or misunderstanding?"

This saves time — the user corrects rather than explains from scratch. The existing code discovery (point 5) is critical — it determines whether the plan is "build from scratch" or "refactor existing". If existing code was found, include a ★ Recommendation on whether to build from scratch or refactor, grounded in what you observed (code quality, coverage, how far it is from the target).

Round 1: Intent & Scope (Why, What, Who)

Techniques: Anti-Requirements, Priority Calibration

Focus on Must-Resolve items from dimensions 1-5:

What is the business goal / motivation?
What's in scope and what's explicitly OUT?
What should this feature NOT do? (anti-requirements)
What would make this feature a failure? (anti-requirements)
Success criteria — how do we know it's done?
If we can only deliver 60%, what's most important? (priority calibration)
What CANNOT be changed? (frozen constraints) — propose what YOU identified as frozen from the codebase exploration using Pattern (B): list the constraints with justification, then ask "approve/adjust?"

Round 2: User Journey & Scenarios (Happy + Failure Paths)

Techniques: Scenario Walking, Risk Elicitation

If the project has a UI/UX component:

Walk through the happy path together: "User opens the page → clicks [X] → sees [Y] → ... what happens next?"
Walk through failure paths: "What if the token expires? What if the API is down? What if there's no data?"
For each error case, propose a user-facing error message — use Pattern (B) from Recommendation Protocol.
"What worries you most about this feature?" (risk elicitation)

If backend/API only:

Walk through the request lifecycle
"What happens when [edge case]?"
Error scenarios and expected behavior
For API errors, propose HTTP status + response body — use Pattern (B) from Recommendation Protocol.

Round 3: Technical Decisions (How, With What)

Techniques: Trade-off Questions, Codebase-Driven Questions

Focus on dimensions activated by Project Profile. Every decision point in this round MUST include a ★ Recommendation:

Technology choices + trade-offs: "If you had to choose between [speed of development] and [full edge-case coverage], which wins?"
Data model decisions
For new API endpoints, propose request/response shapes — use Pattern (B) from Recommendation Protocol.
Security model (if applicable) — sub-question if task touches auth: "Does this feature modify the auth flow, add identity paths, or map users across systems?"
Data volume estimates (if new tables/files/logs): "How many [records/uploads/events] per day do you expect? 100? 1000? 10000+?"
Codebase-Driven: "I found [pattern A] in module X and [pattern B] in module Y — which should I follow?"
Codebase-Driven: "There's a TODO at [file:line] about [X] — relevant to this task?"

Round 4: Quality, Constraints & Edge Cases (What If, How Well)

Techniques: Stakeholder Probing, Risk Elicitation

Focus on remaining activated dimensions. Every decision point in this round MUST include a ★ Recommendation:

Performance targets
Deploy strategy
Observability needs
Concurrency / race conditions
"Who else will be affected by this change?" (stakeholder probing)
"Is anyone else working on related code right now?"
Any remaining Should-Resolve items

Round N: Dynamic Follow-up (Gap Filling)

If the exit checklist (Step 5) is not fully satisfied after Round 4, run additional targeted rounds:

Pick unresolved Must-Resolve items
Ask specific questions based on gaps discovered from previous answers
Synthesize new questions that emerged from the conversation

Step 5: Exit Checklist & Resolution Matrix

Before writing CONTEXT.md, build a Resolution Matrix tracking every gray area discovered during the interview:

| # | Gray Area | Dimension | Priority | Status | Source | Resolution |
|---|-----------|-----------|----------|--------|--------|------------|
| 1 | DB schema approach | 9: Data model | Must | RESOLVED | user | Extend existing users table |
| 2 | Log format | 23: Observability | Nice | ASSUMED | default | JSON structured logging (team standard) |
| 3 | Rollback strategy | 22: Deploy | Should | RESOLVED | rec-accepted | Feature flag, no migration needed |

Status values:

RESOLVED — human gave a clear answer, or recommendation was applied via "just decide"
ASSUMED — no answer, using sensible default (recorded in Assumptions section of CONTEXT.md). For unanswered Should-Resolve items that had a ★ Recommendation: use the recommendation as the assumption, noting ASSUMED (from recommendation).
OPEN — still unresolved

Source values:

user — user made the decision (no recommendation, or user's own choice)
rec-accepted — user accepted the ★ Recommendation
rec-overridden — user overrode the ★ Recommendation with their own choice
just-decide — recommendation auto-applied because user said "just decide"
default — sensible default, no user input

Exit rules:

ALL Must-Resolve items must be RESOLVED (not ASSUMED, not OPEN). If any remain OPEN → run another mini-round.
Should-Resolve items can be ASSUMED with explicit note.
Nice-to-Know items can be ASSUMED freely.

Then verify the structural checklist:

Must pass (blocks Research):

Scope is locked (in-scope AND out-of-scope defined)
Success criteria are concrete and testable
Resolution Matrix shows zero OPEN Must-Resolve items
No contradictions between decisions
Frozen constraints identified

Should pass (improves quality):

User journey / main scenarios described (if applicable)
Failure paths identified
Security model defined (if applicable)
Performance targets set (if applicable)
Trade-off decisions recorded with rationale

Nice to have:

Tech debt in affected area catalogued
Stakeholders and parallel work identified
Deploy strategy defined

Update state.json with final counts: interview_must_resolve_total and interview_must_resolve_closed.

Step 6: Write Expanded CONTEXT.md

Write {SESSION}/CONTEXT.md with this structure.

QUALITY RULES (apply before writing each section):

No duplication. Each fact appears in ONE canonical section. Other sections reference it, not repeat it. If a decision is in "Decisions & Trade-offs", do NOT restate it in Gray Areas, Constraints, or Anti-Requirements.
No filler. If a section has nothing meaningful to say, write "N/A" — don't pad with generic content.
No micro-decisions. Decisions must be architectural (affect multiple files, change data flow, or have alternatives with real trade-offs). "Use CSS grid for layout" or "Use IconCopy for copy button" are NOT decisions.
No generic risks. Risks must be specific to THIS project and THIS task. "Performance might degrade" or "Scope creep" are NOT risks.
No tech-stack-as-assumptions. Tool/library choices (Google Fonts, Lucide icons) belong in Project Profile or Decisions, NOT in Assumptions.
Conditional sections. For TRIVIAL/SMALL tasks, skip sections marked [SKIP IF SIMPLE] below. For MEDIUM+ include all.

# Bishx-Plan Context

## Task Description

### Original Request (verbatim)
[Copy the EXACT text from state.json task_description — do not rephrase, summarize, or clean up]

### Clarified Scope (from interview)
[Refined understanding after interview — what the task actually means, additional details discovered]

## Project Profile
- Type: [frontend/backend/fullstack/CLI/library]
- Stack: [language, framework, DB, etc.]
- Test Framework: [jest/pytest/etc.]
- Auth: [JWT/sessions/none/etc.]
- CI/CD: [yes/no]

## Codebase Summary
[Tech stack, project structure, key patterns discovered during exploration.
Include specific file paths, existing patterns, and architectural insights.
For greenfield: "N/A — greenfield project"]

## User Stories / Scenarios
[From Scenario Walking — happy path + failure paths]
- As a [user], I [action] so that [outcome]
- When [condition], then [expected behavior]
- When [error condition], then [expected recovery]
[Include error MESSAGE text for user-facing errors, not just conditions]

## Scope
### In Scope
- [Item]
### Out of Scope (Explicit)
- [Item]
### Anti-Requirements (Must NOT Do)
- [Item — only genuine "must not" constraints, not just "don't break things"]

## Decisions & Trade-offs
Architectural decisions with WHY (ADR-style). Include trade-offs here — chose A over B.
**Filter:** Only decisions that affect architecture, data flow, API design, or have
real alternatives with trade-offs. NOT: CSS class choices, icon selections,
single-line implementation details.
1. **[Decision area]:** [Choice] over [Alternative] — because [rationale] [source: rec-accepted / rec-overridden / just-decide / user]
2. ...

## Assumptions
[Real assumptions that could be WRONG and would change the plan if incorrect]
- Assuming [X] because [reason]. Override if incorrect.
**Filter:** NOT tech stack choices (those go in Decisions). NOT obvious defaults
("assuming Russian language" for a Russian project).

## Constraints (Frozen)
[Things that CANNOT be changed — external contracts, legacy APIs, compliance requirements]
- [Constraint]: [why it's frozen]

## Risks [SKIP IF SIMPLE]
[Project-specific risks only. NOT generic advice ("test thoroughly", "may affect performance")]
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| [specific risk for THIS task] | H/M/L | H/M/L | [concrete action, not "monitor"] |

## Stakeholders & Dependencies [SKIP IF SIMPLE]
- [Who is affected]
- [External dependencies or parallel work]

## Codebase Notes
- TODO/FIXME found: [list with file:line, or "none in scope"]
- Pattern conflicts: [what and where, or "none"]
- Tech debt: [relevant items, or "none in scope"]
- Test gaps: [modules with low coverage, or "N/A"]

## Success Criteria / Definition of Done
- [ ] [Specific, testable criterion]
- [ ] [Specific, testable criterion]
[Include expected error messages for error-handling tasks.
Include data volume estimates for data-heavy tasks.
Include API request/response shapes for API tasks.
Include migration rollback strategy for migration tasks.]

## Priority Map [SKIP IF SIMPLE]
[If not everything fits, what matters most → least]
1. [Highest priority item]
2. ...

## Gray Areas Resolution Matrix
Only GENUINELY AMBIGUOUS items resolved during interview.
**Filter:** Do NOT include items already fully captured in Decisions or Scope.
A gray area is something where the user could have gone either way and the
choice was non-obvious. "Which icon to use" is NOT a gray area.
| # | Gray Area | Dimension | Priority | Status | Resolution |
|---|-----------|-----------|----------|--------|------------|
| 1 | [real ambiguity] | [dim #: name] | Must/Should/Nice | RESOLVED/ASSUMED | [answer] |

## Interview Metadata
- Rounds completed: [N]
- Must-Resolve: [X/Y resolved]
- Should-Resolve: [X/Y resolved]
- Assumptions made: [N items — see Assumptions section above]

Step 7: Update State & Signal

Update state.json: Keep phase as "interview", set interview_round to final round number, pipeline_actor as ""
- You MAY update task_description here (and ONLY here) if the interview meaningfully changed the scope. If scope didn't change — leave it as-is.
- After this point, task_description is FROZEN. No subsequent phase (research, pipeline, iterations) may modify it. When updating state.json in later phases, always READ the file first, then use Edit to change only the specific fields that need updating. Never Write the full file from memory — task_description will be lost to context compaction.
Emit <bishx-plan-done>

Complexity Gate (between Interview and Research)

After Interview, before Research, assess feature complexity to determine pipeline mode:

TRIVIAL  (1-2 files, <20 lines) → Skip pipeline: Planner → Critic only (lite mode)
SMALL    (2-4 files, <100 lines) → Lite pipeline: Planner → [Skeptic + Completeness] → Critic
MEDIUM   (5-10 files)            → Standard pipeline: all always-on actors
LARGE    (10+ files)             → Full pipeline: all actors including conditional
EPIC     (multiple features)     → Split into sub-features first, plan each separately

Determination: Based on Interview findings — scope size, number of requirements, architectural complexity. Override: Human can request full pipeline for any size ("run full pipeline"). State: Set complexity_tier in state.json before proceeding.

Phase 2: Research (triggered by hook)

The Stop hook injects a prompt telling you to run research. Follow its instructions:

Read {SESSION}/CONTEXT.md
Derive Research Questions — extract specific RQ-NNN items from CONTEXT.md decisions, requirements, and risks
Spawn researcher:
```
Task(subagent_type="bishx:researcher", model="opus", prompt=<CONTEXT.md + research questions>)
```
Fallback: If bishx:researcher is not available as a subagent type, read the agent file at the plugin's agents/researcher.md and inline its instructions as a prompt prefix.

Domain Detection: The hook automatically discovers relevant domain skills. If detected, their content is injected into the researcher prompt for specialized context.
Write output to {SESSION}/RESEARCH.md
Verify Research Coverage — check that all RQ items were answered. Flag gaps.
Update state.json: phase → "research"
Emit <bishx-plan-done>

Skill-Library Lookup (between Research and Pipeline)

After Research completes and before the first Planner spawn, perform a skill-library lookup to give Planner and Skeptic domain-specific knowledge from ~/.claude/skill-library/.

This step runs ONCE per session (before iteration 1). The output files (PLANNER-SKILLS.md, SKEPTIC-SKILLS.md) persist for all subsequent iterations.

Selection Process

Read ~/.claude/skill-library/INDEX.md — use the Category Router to identify ALL relevant categories based on:
- Tech stack from CONTEXT.md Project Profile (languages, frameworks, databases)
- Task domain from CONTEXT.md (auth, API, UI, data processing, etc.)
For each relevant category, read ~/.claude/skill-library/<category>/INDEX.md — review each skill's description and "Use when..." triggers.
For Planner — select skills where "Use when..." matches what the planner needs to design tasks:
- Implementation patterns, API usage guides, framework conventions
- Best practices for the specific technologies in the task
- Architecture patterns relevant to the task's domain
- EXCLUDE: review-only checklists, security audit procedures, testing methodology skills
For Skeptic — select skills useful for verifying plan correctness:
- Anti-pattern catalogs, correctness rules, known pitfalls
- Domain-specific gotchas and common mistakes
- Security patterns (if task involves auth, user input, or API design)
- EXCLUDE: step-by-step setup guides, deployment procedures
For each selected skill, note the full SKILL.md path and line count (wc -l via Bash).
Budget: ≤2500 total lines per agent. If over budget, drop the least task-relevant skills first.

Output

Write two files to {SESSION}/:

PLANNER-SKILLS.md:

# Planner Skills
Read these FULL SKILL.md files before creating the plan. Budget: ≤2500 lines.
1. `~/.claude/skill-library/<category>/<skill>/SKILL.md` (N lines) — brief description
2. ...

SKEPTIC-SKILLS.md:

# Skeptic Skills
Read these FULL SKILL.md files before reviewing the plan. Budget: ≤2500 lines.
1. `~/.claude/skill-library/<category>/<skill>/SKILL.md` (N lines) — brief description
2. ...

Skills may overlap between planner and skeptic — that's expected. Agents read skill files themselves (pass paths, not content).

Phase 3: Pipeline Loop (triggered by hook per actor)

Pipeline Topology

The pipeline has three sub-phases per iteration:

3a. Planner (sequential)
         ↓
3b. Parallel Review (all reviewers simultaneously)
         ↓
3c. Critic (sequential, aggregates all reports)
         ↓
3d. Dry-Run Simulator (conditional — only after APPROVED)

                                  ┌─ Skeptic (opus) ──────────┐
                                  ├─ TDD Reviewer (sonnet) ───┤
Planner (opus) ──→ [Parallel:     ├─ Completeness (sonnet) ───┤ ] ──→ Critic (opus)
                                  ├─ Integration (sonnet) ────┤
                                  ├─ Security* (sonnet) ──────┤
                                  └─ Performance* (sonnet) ───┘
                                    * = conditional

Sub-Phase 3a: Planner

Update state.json: phase → "pipeline", pipeline_actor → "planner", agent_pending → true
Re-read {SESSION}/CONTEXT.md and {SESSION}/RESEARCH.md from disk (EVERY iteration — never rely on memory, context may have been compacted)
If revision: also read ALL feedback from prior iteration
Spawn planner:
```
Task(subagent_type="bishx:planner", model="opus", prompt=<context + research + feedback>)
```
Domain Skills: If {SESSION}/PLANNER-SKILLS.md exists, tell the planner to read the FULL SKILL.md files listed in it before creating the plan. Budget: ≤2500 total lines. Agents read skills themselves — pass paths, not content.
When planner completes: set agent_pending → false in state.json
Create {SESSION}/iterations/NN/
Write output to {SESSION}/iterations/NN/PLAN.md
Emit <bishx-plan-done>

Sub-Phase 3b: Parallel Review

CRITICAL: All reviewers run simultaneously. Launch ALL applicable actors in a single response with multiple Task calls.

Always-on actors (every iteration):

Actor	Subagent Type	Model	Reads	Writes
Skeptic	`bishx:skeptic`	opus	PLAN.md, CONTEXT.md, SKEPTIC-SKILLS.md	`SKEPTIC-REPORT.md`
TDD Reviewer	`bishx:tdd-reviewer`	sonnet	PLAN.md	`TDD-REPORT.md`
Completeness Validator	`bishx:completeness-validator`	sonnet	PLAN.md, CONTEXT.md	`COMPLETENESS-REPORT.md`
Integration Validator	`bishx:integration-validator`	sonnet	PLAN.md	`INTEGRATION-REPORT.md`

Conditional actors (activated by Project Profile):

Actor	Subagent Type	Model	Condition	Writes
Security Reviewer	`bishx:security-reviewer`	sonnet	auth / user-input / API / DB	`SECURITY-REPORT.md`
Performance Auditor	`bishx:performance-auditor`	sonnet	DB / API / perf-requirements	`PERFORMANCE-REPORT.md`

Fallback: If a subagent type is unavailable, read the agent file from agents/ and inline its instructions as a prompt prefix.

Launch pattern:

Each agent receives OUTPUT_PATH so it writes the report to disk itself. This prevents context compaction from losing results.

# Launch ALL in parallel (single response, multiple Task calls)
# Prepend OUTPUT_PATH to each prompt — agents write reports to disk via Write tool
Task(subagent_type="bishx:skeptic", model="opus", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/SKEPTIC-REPORT.md\n\n" + <PLAN + CONTEXT> + "Read skill files listed in {SESSION}/SKEPTIC-SKILLS.md if exists")
Task(subagent_type="bishx:tdd-reviewer", model="sonnet", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/TDD-REPORT.md\n\n" + <PLAN>)
Task(subagent_type="bishx:completeness-validator", model="sonnet", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/COMPLETENESS-REPORT.md\n\n" + <PLAN + CONTEXT>)
Task(subagent_type="bishx:integration-validator", model="sonnet", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/INTEGRATION-REPORT.md\n\n" + <PLAN>)
# Conditional:
Task(subagent_type="bishx:security-reviewer", model="sonnet", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/SECURITY-REPORT.md\n\n" + <PLAN + CONTEXT>)
Task(subagent_type="bishx:performance-auditor", model="sonnet", prompt="OUTPUT_PATH: {SESSION}/iterations/NN/PERFORMANCE-REPORT.md\n\n" + <PLAN>)

Agents write full reports to disk. Text responses contain only brief summaries (score + blocking count).

Before launching parallel actors:

Update state.json: pipeline_actor → "parallel-review", agent_pending → true

After ALL parallel actors complete:

Verify all expected report files exist: Glob("{SESSION}/iterations/NN/*-REPORT.md")
If any expected report is missing — warn user, do NOT proceed to Critic
Set agent_pending → false in state.json
Emit <bishx-plan-done>

Sub-Phase 3c: Critic

Before spawning critic:

Update state.json: pipeline_actor → "critic", agent_pending → true

Re-read from disk (never rely on memory — context may have been compacted):
- {SESSION}/CONTEXT.md
- {SESSION}/RESEARCH.md
- ALL files from {SESSION}/iterations/NN/:
  - PLAN.md
  - SKEPTIC-REPORT.md (if exists)
  - TDD-REPORT.md (if exists)
  - COMPLETENESS-REPORT.md (if exists)
  - INTEGRATION-REPORT.md (if exists)
  - SECURITY-REPORT.md (if exists)
  - PERFORMANCE-REPORT.md (if exists)
- If iteration > 1: {SESSION}/iterations/{NN-1}/VERIFIED_ITEMS.md (regression baseline)

Spawn critic:

Task(subagent_type="bishx:critic", model="opus", prompt=<all reports + context + previous VERIFIED_ITEMS.md>)

Write TWO files to {SESSION}/iterations/NN/:
- CRITIC-REPORT.md — evaluation report with verdict
- VERIFIED_ITEMS.md — regression baseline for the next iteration
When critic completes: set agent_pending → false in state.json
Parse verdict, score, flags from output
Update state.json: critic_verdict, scores_history, flags
Emit <bishx-plan-done>

After Critic

The hook reads the verdict from state.json and routes:

Score thresholds (percentage-based):

APPROVED (≥75%) AND zero blocking issues → Phase 3d (Dry-Run) if +dry-run flag was set, otherwise Phase 4 (Finalize)
REVISE (60-74%) OR (≥75% with blocking issues) → Increment iteration, planner gets structured Action Items
REJECT (<60%) → Always re-research (a plan scoring this low needs new data), then planner with all feedback

Sub-Phase 3d: Dry-Run Simulation (opt-in, only with `+dry-run` flag)

This phase is SKIPPED by default. It only runs if the user included +dry-run in their task description. If skipped, APPROVED verdict from Critic goes directly to Phase 4 (Finalize).

When enabled (dry_run_enabled: true in state.json):

Before spawning simulator:

Update state.json: phase → "dry-run", agent_pending → true

Read the approved plan from {SESSION}/iterations/NN/PLAN.md

Spawn simulator:

Task(subagent_type="bishx:dry-run-simulator", model="opus", prompt=<PLAN.md only — no other context>)

Write output to {SESSION}/iterations/NN/DRYRUN-REPORT.md
When simulator completes: set agent_pending → false in state.json
Parse verdict: PASS / FAIL / WARN
- PASS → Proceed to Phase 4 (Finalize)
- FAIL → Downgrade to REVISE, add DRYRUN issues to feedback, re-enter planner
- WARN → Proceed to Finalize with warnings noted
Update state.json accordingly
Emit <bishx-plan-done>

Special Flags

NEEDS_RE_RESEARCH: Triggers researcher re-run with targeted RESEARCH-REQ-NNN items
NEEDS_HUMAN_INPUT: Pauses pipeline for human interaction
NEEDS_SPLIT: Plan too complex, recommend decomposition into sub-plans

Issue Lifecycle (across iterations)

All actors use unified {PREFIX}-NNN issue IDs. Prefixes: SKEPTIC, TDD, COMPLETENESS, INTEGRATION, SECURITY, PERF, DRYRUN, REGRESS (critic regression), EXEC (critic executability checklist). Issues track across iterations:

OPEN → FIXED (planner addressed) → VERIFIED (critic confirmed)
OPEN → REBUTTED (planner argued) → ACCEPTED (critic agrees) | REJECTED (stays OPEN)
OPEN → DEFERRED (planner postponed) → only allowed for MINOR severity
VERIFIED → REGRESSED (broken again) → auto-escalated severity

Repeated Issue Escalation:

Seen in 1 iteration → severity as-is
Seen in 2 iterations → severity +1 (MINOR→IMPORTANT, IMPORTANT→BLOCKING)
Seen in 3+ iterations → BLOCKING + flag NEEDS_HUMAN_INPUT

Phase 4: Finalize (triggered when Critic approves; or after Dry-Run passes if `+dry-run` enabled)

Read the approved plan from {SESSION}/iterations/NN/PLAN.md
Read the Critic report from {SESSION}/iterations/NN/CRITIC-REPORT.md
Write {SESSION}/APPROVED_PLAN.md (copy of approved plan)

Generate Execution Readiness Package — append to APPROVED_PLAN.md:

---
## Execution Readiness Summary

### Pre-flight Checklist
- [ ] [Dependencies to install]
- [ ] [Migrations to run]
- [ ] [Environment variables to set]
- [ ] [Pre-existing test suite passes]

### Execution Order
Wave 1 (parallel): Tasks X, Y — [description]
Wave 2 (parallel): Tasks A, B — depends on Wave 1
...

### Risk Hotspots
⚠ Task N (confidence: LOW) — [why risky]
⚠ Task M (security-sensitive) — [extra review needed]

### Rollback Strategy
[Per-task or per-wave rollback approach]

### Estimated Complexity
Total tasks: N | Waves: M | TDD cycles: P
Estimated: [SMALL/MEDIUM/LARGE]

### Known Limitations
[Deferred issues, low-confidence areas, assumptions to validate]

### Iteration History
[Score progression, what improved, what was accepted as-is]

Generate datetime filename: plan-YYYY-MM-DD-HHmmss.md
Write to ~/.claude/plans/{datetime-filename}
Present to human:
- Final plan summary
- Iteration count and score progression
- Confidence map (HIGH/MEDIUM/LOW per task)
- Risk hotspots
- Path to approved plan: {SESSION}/APPROVED_PLAN.md
Update state.json: phase → "finalize"
Emit <bishx-plan-done> (hook will tell you to emit <bishx-plan-complete>)

State.json Schema

{
  "active": true,
  "session_id": "string",
  "session_dir": "string",
  "task_description": "string (FROZEN after interview Step 7 — update only via Edit, never rewrite)",
  "iteration": 1,
  "max_iterations": 10,
  "tdd_enabled": true,
  "dry_run_enabled": false,
  "complexity_tier": "trivial|small|medium|large|epic",
  "phase": "interview|research|pipeline|dry-run|finalize|complete|max_iterations",
  "interview_round": 0,
  "interview_must_resolve_total": 0,
  "interview_must_resolve_closed": 0,
  "pipeline_actor": "planner|parallel-review|critic|\"\"",
  "parallel_actors": ["skeptic", "tdd-reviewer", "completeness-validator", "integration-validator"],
  "conditional_actors": ["security-reviewer", "performance-auditor"],
  "active_conditional": [],
  "critic_verdict": "APPROVED|REVISE|REJECT|\"\"",
  "critic_score_pct": 0,
  "scores_history": [
    {
      "iteration": 1,
      "score_pct": 72,
      "weighted_total": 36.0,
      "weighted_max": 50.0,
      "blocking_issues": 2,
      "total_issues": 8,
      "breakdown": {
        "correctness": {"score": 3, "weight": 3.0, "ceiling_from": "skeptic"},
        "completeness": {"score": 4, "weight": 2.5, "ceiling_from": "completeness-validator"},
        "executability": {"score": 3, "weight": 2.5, "ceiling_from": "integration-validator"},
        "tdd_quality": {"score": 4, "weight": 1.5, "ceiling_from": "tdd-reviewer"},
        "security": {"score": 3, "weight": 1.5, "ceiling_from": "security-reviewer"},
        "performance": {"score": 0, "weight": 0, "ceiling_from": "n/a"},
        "code_quality": {"score": 4, "weight": 0.5, "ceiling_from": "own"}
      }
    }
  ],
  "issue_registry": [
    {"id": "SKEPTIC-001", "severity": "BLOCKING", "status": "OPEN", "seen_in_iterations": [1]}
  ],
  "agent_pending": false,
  "flags": [],
  "detected_skills": [],
  "project_profile": {
    "type": "fullstack",
    "has_db": true,
    "has_auth": true,
    "has_frontend": true,
    "has_api": true,
    "has_perf_requirements": false
  },
  "started_at": "ISO",
  "updated_at": "ISO"
}

Subagent Availability Protocol

For ANY Task() call in the pipeline:

Try subagent_type as written (e.g., bishx:skeptic)
If unavailable, check {plugin_path}/agents/{actor}.md
If file exists, read it and inline its content as system prompt prefix in a general-purpose Task
If neither available: emit error, pause for human

Error Recovery

Actor Failure

If any actor crashes or returns garbage:

Retry ONCE with the same prompt
If still fails → mark actor as SKIPPED in state.json
Critic receives {ACTOR}-SKIP-REPORT.md (auto-generated note) instead of real report
Critic score ceiling drops: skipped actor's dimension = max 3/5
If ≥2 actors SKIPPED → auto REVISE (insufficient data for reliable verdict)

Parallel Phase Timeout

If any parallel actor runs >5 minutes:

Kill the slow actor
Proceed with available reports
Mark slow actor as TIMEOUT in state.json
Critic treats TIMEOUT same as SKIP

State Corruption

If state.json is malformed:

Read iteration directories to reconstruct state
Find highest-numbered iteration with complete files
Present to human: "State recovered from iteration N. Continue?"

Human Abort Mid-Pipeline

If user closes terminal during parallel phase:

On next bishx:plan invocation → detect partial state
Check which actors completed (by presence of report files)
Re-run only incomplete actors, don't re-run completed ones

Hook Not Firing

The "no signal detected" fallback in the hook will remind of the current phase.

Edge Cases

Empty/New Project (no existing code)

Skeptic: Skip codebase verification, focus on external claims and logic
TDD: No existing test patterns → use framework defaults, note in report
Integration: Skip inter-file checks, focus on task-to-task data flow
Completeness: Fully active (requirements still matter)

No Tests in Project

TDD Reviewer: Note absence, recommend test infrastructure as Task 0
Don't penalize plan for not following non-existent patterns
Still require TDD design for new code

Trivial Feature (TRIVIAL complexity tier)

Skip parallel review phase entirely
Run: Planner → Critic only (lite mode)
Critic uses simplified scoring (Correctness + Executability only)

Very Large Plan (20+ tasks)

Critic flags NEEDS_SPLIT — human decides how to decompose
Each sub-plan runs independently through the pipeline
Shared dependencies documented in a parent CONTEXT.md

Contradictory Requirements

If detected during Interview → resolve before proceeding (blocks Research)
If detected during Pipeline → Critic flags NEEDS_HUMAN_INPUT
Plan MUST NOT contain contradictory tasks

Score Drops Between Iterations

If total score DECREASES → auto-flag for human review
Present: "Score dropped from X% to Y%. Review changes?"

Post-Execution Feedback Loop

After bishx:run completes execution of a plan, it should generate:

## Post-Execution Feedback
### Plan Quality
- Tasks executed successfully: X/Y
- Tasks needing modification: N (list with reasons)
- First friction point: Task M, step S — [what was unclear]

### Lessons Learned
- PATTERN: [recurring lesson for future plans]
- GAP: [what interview/research should have caught]

This is saved to ~/.bishx/feedback/ and read by Researcher in future sessions for accumulated wisdom.

Rules

NEVER skip a required actor. Every iteration runs Planner + all always-on reviewers + Critic. Conditional actors run based on Project Profile.
ALWAYS update state.json before emitting signals. The hook depends on this. Use Edit (not Write) for state.json updates — read the file, change only the fields that need updating. task_description is FROZEN after Step 7 of the interview — never modify it in pipeline/research/iteration phases.
ALWAYS pass file contents in Task prompts. Subagents cannot read the orchestrator's files on their own. Reviewers write their own reports to disk via OUTPUT_PATH protocol — main agent does NOT re-write them, only verifies files exist.
Keep context compact. Each actor gets what it needs, nothing more. CONTEXT.md + PLAN.md for reviewers. All reports for Critic.
- ALWAYS re-read CONTEXT.md and RESEARCH.md from disk before every iteration. After compaction, your memory of these files is unreliable. The files on disk are the source of truth — read them fresh every time you pass them to Planner, Reviewers, or Critic.
Respect the human. During interview, wait for real answers. Don't auto-answer gray areas.
Parallel when possible. Launch ALL parallel reviewers in a SINGLE response with multiple Task calls.
ALWAYS set agent_pending: true in state.json BEFORE spawning any background agent, and agent_pending: false AFTER the agent completes. The Stop hook checks this field to avoid interfering with running agents.
Track progress. After EVERY phase, output a status line:

[bishx-plan] Phase | Status message

Status templates:

[bishx-plan] Interview done | N rounds, X/Y must-resolve closed, Z assumptions
[bishx-plan] Research done | N questions answered, M/P high-confidence
[bishx-plan] Complexity: MEDIUM | Standard pipeline activated
[bishx-plan] Iter K — Planner done | N tasks, M waves, P TDD cycles
[bishx-plan] Iter K — Parallel review | launching N actors...
[bishx-plan] Iter K — Reviews done | Skeptic: X issues, TDD: Y issues, Completeness: Z issues, Integration: W issues
[bishx-plan] Iter K — Critic: APPROVED 85% (42.5/50) | 0 blocking, 3 important
[bishx-plan] Iter K — Critic: REVISE 68% | 2 blocking, 4 important
[bishx-plan] Iter K — Dry-run: PASS | 3/3 tasks executable
[bishx-plan] FINAL | Approved after K iterations, score X% | N tasks, M waves

Replace values with actuals from agent output. ONE line per phase.

plan

Popularity

Invocation

Context Preview

SKILL.md

plan

Popularity

Invocation

Context Preview

SKILL.md

Bishx-Plan: Automated Iterative Planning

Pipeline Overview

Session Directory

Signal Protocol

Phase 0: Initialize

Phase 1: Interview (Multi-Round Adaptive Discovery)

Step 1: Codebase Exploration & Project Profiling

Step 2: Dimension Selection

Step 3: Priority Classification

Step 4: Multi-Round Interview Execution

Recommendation Protocol

Round 0: Codebase Briefing (Assumption Surfacing)

Round 1: Intent & Scope (Why, What, Who)

Round 2: User Journey & Scenarios (Happy + Failure Paths)

Round 3: Technical Decisions (How, With What)

Round 4: Quality, Constraints & Edge Cases (What If, How Well)

Round N: Dynamic Follow-up (Gap Filling)

Step 5: Exit Checklist & Resolution Matrix

Step 6: Write Expanded CONTEXT.md

Step 7: Update State & Signal

Complexity Gate (between Interview and Research)

Phase 2: Research (triggered by hook)

Skill-Library Lookup (between Research and Pipeline)

Selection Process

Output

Phase 3: Pipeline Loop (triggered by hook per actor)

Pipeline Topology

Sub-Phase 3a: Planner

Sub-Phase 3b: Parallel Review

Sub-Phase 3c: Critic

After Critic

Sub-Phase 3d: Dry-Run Simulation (opt-in, only with +dry-run flag)

Special Flags

Issue Lifecycle (across iterations)

Phase 4: Finalize (triggered when Critic approves; or after Dry-Run passes if +dry-run enabled)

State.json Schema

Subagent Availability Protocol

Error Recovery

Actor Failure

Parallel Phase Timeout

State Corruption

Human Abort Mid-Pipeline

Hook Not Firing

Edge Cases

Empty/New Project (no existing code)

No Tests in Project

Trivial Feature (TRIVIAL complexity tier)

Very Large Plan (20+ tasks)

Contradictory Requirements

Score Drops Between Iterations

Post-Execution Feedback Loop

Rules

Similar Skills

Bishx-Plan: Automated Iterative Planning

Pipeline Overview

Session Directory

Signal Protocol

Phase 0: Initialize

Phase 1: Interview (Multi-Round Adaptive Discovery)

Step 1: Codebase Exploration & Project Profiling

Step 2: Dimension Selection

Step 3: Priority Classification

Step 4: Multi-Round Interview Execution

Recommendation Protocol

Round 0: Codebase Briefing (Assumption Surfacing)

Round 1: Intent & Scope (Why, What, Who)

Round 2: User Journey & Scenarios (Happy + Failure Paths)

Round 3: Technical Decisions (How, With What)

Round 4: Quality, Constraints & Edge Cases (What If, How Well)

Round N: Dynamic Follow-up (Gap Filling)

Sub-Phase 3d: Dry-Run Simulation (opt-in, only with `+dry-run` flag)

Phase 4: Finalize (triggered when Critic approves; or after Dry-Run passes if `+dry-run` enabled)

Sub-Phase 3d: Dry-Run Simulation (opt-in, only with `+dry-run` flag)

Phase 4: Finalize (triggered when Critic approves; or after Dry-Run passes if `+dry-run` enabled)