From manifest-dev
Builds manifests to plan work, scope tasks, spec requirements, and break down complex tasks before coding. Converts needs into deliverables, invariants, and verification criteria for features, bugs, refactors.
npx claudepluginhub doodledood/manifest-dev --plugin manifest-devThis skill uses the workspace's default tool permissions.
If thinking disciplines are not already active in this session, invoke the manifest-dev:thinking-disciplines skill. Do not begin the interview until disciplines are active. Apply throughout — every question, assessment, and synthesis.
references/AMENDMENT_MODE.mdreferences/WRITING-REFERENCE.mdreferences/interview-modes/autonomous.mdreferences/interview-modes/minimal.mdreferences/interview-modes/thorough.mdreferences/messaging/LOCAL.mdtasks/BLOG.mdtasks/BUG.mdtasks/CODING.mdtasks/DOCUMENT.mdtasks/FEATURE.mdtasks/PROMPTING.mdtasks/REFACTOR.mdtasks/WRITING.mdtasks/references/research/README.mdtasks/references/research/agentic-systems.mdtasks/references/research/ai-failure-modes.mdtasks/references/research/epistemology.mdtasks/references/research/fact-checking.mdtasks/references/research/information-science.mdCreates structured plans for multi-step tasks including software features, implementations, research, or projects. Deepens plans via interactive sub-agent reviews.
Share bugs, ideas, or general feedback.
If thinking disciplines are not already active in this session, invoke the manifest-dev:thinking-disciplines skill. Do not begin the interview until disciplines are active. Apply throughout — every question, assessment, and synthesis.
Build a comprehensive Manifest that captures:
Why thoroughness matters: Every criterion discovered NOW is one fewer rejection during implementation/review. The goal is a deliverable that passes review on first submission—no "oh, I also needed X" after the work is done.
Comprehensive means surfacing latent criteria—requirements the user doesn't know they have until probed. Users know their surface-level needs; your job is to discover the constraints and edge cases they haven't thought about.
Aim for high coverage. Amendments handle what emerges during implementation.
Output: /tmp/manifest-{timestamp}.md
$ARGUMENTS = task description, optionally with context/research, --interview <level>, --medium <type>, --amend <manifest-path>
Parse --interview from arguments (can appear anywhere). Valid values: minimal, autonomous, thorough. Default: thorough. Invalid value → error and halt: "Invalid interview style ''. Valid styles: minimal | autonomous | thorough"
Parse --medium from arguments (can appear anywhere). Currently only local is supported (default). Other mediums may be added in the future. If a non-local value is provided, error and halt: "Medium '' not yet supported. Currently supported: local". See Medium Routing section below.
Parse --amend <manifest-path> from arguments (can appear anywhere). --from-do flag (optional, used with --amend) — see references/AMENDMENT_MODE.md for behavior.
If no arguments provided, ask: "What would you like to build or change?"
Domain-specific guidance available in:
| Domain | Indicators | Guidance File |
|---|---|---|
| Coding | Any code change (base for Feature, Bug, Refactor) | tasks/CODING.md |
| Feature | New functionality, APIs, enhancements | tasks/FEATURE.md |
| Bug | Defects, errors, regressions, "not working", "broken" | tasks/BUG.md |
| Refactor | Restructuring, reorganization, "clean up", pattern changes | tasks/REFACTOR.md |
| Prompting | LLM prompts, skills, agents, system instructions | tasks/PROMPTING.md |
| Writing | Prose, articles, emails, marketing copy, social media, creative writing (base for Blog, Document) | tasks/WRITING.md |
| Document | Specs, proposals, reports, formal docs (base: Writing) | tasks/DOCUMENT.md |
| Research | Investigations, analyses, comparisons | tasks/research/RESEARCH.md |
| Blog | Blog posts, articles, tutorials (base: Writing) | tasks/BLOG.md |
Composition: Code-change tasks combine CODING.md (base quality gates) with domain-specific guidance. Text-authoring tasks combine WRITING.md (base prose quality) with content-type guidance—a "blog post" benefits from both WRITING.md and BLOG.md, a "technical proposal" from both WRITING.md and DOCUMENT.md. Research tasks compose RESEARCH.md (base research methodology) with source-type files—when web research is identified as relevant, load tasks/research/sources/SOURCE_WEB.md alongside tasks/research/RESEARCH.md. RESEARCH.md's Data Sources table lists available source files and probes which sources apply. Domains aren't mutually exclusive—a "bug fix that requires refactoring" benefits from both BUG.md and REFACTOR.md. Related domains compound coverage.
Exception: PROMPTING tasks do NOT compose with CODING.md unless the task also changes executable code. PROMPTING.md has its own quality gates (prompt-reviewer, clarity, structure, etc.). When a task changes both prompts AND code, apply both PROMPTING.md and CODING.md gates, scoping each to the relevant files.
Task file structures are presumed relevant. Task files contain quality gates, reviewer agents, risks, scenarios, and trade-offs. These are angles you won't think to check on your own — they exist precisely because they're easy to miss. Quality gates are auto-included; Resolvable structures (risks, scenarios, trade-offs) must be resolved: either resolved per the interview mode's decision authority, or explicitly skipped with logged reasoning (e.g., "CODING.md concurrency risk skipped: single-threaded CLI tool"). Silent drops are the failure mode — not over-asking.
Task file content types. Five categories, each handled differently:
## Quality Gates — tables, bullet lists, or any format with thresholds/criteria) — auto-include as INV-G*, omit clearly inapplicable with logged reasoning. User reviews manifest.## Defaults section, included in manifest without probing, user reviews manifest and removes if not applicable). Both become PG-* in the manifest.references/*.md) — detailed lookup data for /verify agents. Do not load during the interview.Encode quality gates and Defaults immediately after reading task files — before the interview. Log each as - [x] RESOLVED.
Probing beyond task files is adaptive — driven by the specific task, user responses, and what you discover. Task files don't cap what to ask; they set a floor.
If input references a previous manifest: treat it as source of truth. It contains validated decisions — default to building on it, preserving what's settled. Confirm approach with user if unclear.
When --amend <manifest-path> is present: read references/AMENDMENT_MODE.md for amendment rules.
When task spans multiple repositories, capture during intent (starting points):
Scope deliverables and verification to repo context. Cross-repo invariants get explicit verification checking both sides.
Verifiable - Every Invariant and AC has an automated verification method. Constraints that can't be verified from output go in Process Guidance. Manual only as last resort.
Validated - Generate concrete candidates; learn from user reactions. The interview mode file defines behavioral specifics.
Domain-grounded - Understand the domain before probing. Latent criteria emerge from domain understanding — you can't surface what you don't know.
Complete - Surface hidden requirements through five coverage goals. Understanding from any source counts equally.
Directed - For complex tasks, establish initial implementation direction (Approach). Architecture defines starting direction, not step-by-step script.
Efficient - Each question must: materially change the manifest, lock an assumption, or choose between meaningful trade-offs. If it fails all three, don't ask. One missed criterion costs more than one extra question — err toward asking, never ask trivia.
Five goals that must be met before convergence. Each defines WHAT must be true and a convergence test. Items resolved from any source (conversation, prior research, task files, exploration) count equally. The interview probes gaps, not territory already covered. The active interview mode defines how gaps are probed and decisions are made.
| Goal | Convergence test |
|---|---|
| Domain Understanding | Can you generate project-specific (not generic) failure scenarios? |
| Reference Class | Can you name the task type and its common failure modes? |
| Failure Modes | All scenarios have dispositions (encoded, scoped out, or mitigated)? |
| Positive Dependencies | Load-bearing assumptions surfaced and each has a disposition? |
| Process Self-Audit | Scope-creep risks identified and resolved? (skip if straightforward) |
What must be true: You understand the affected area well enough to generate project-specific failure scenarios — not generic ones. You know existing patterns, structure, constraints, and prior decisions relevant to the task.
Understanding comes from any source — conversation context, prior research, code exploration, documentation, user-provided arguments, task files. Don't re-discover what's already known. When understanding is insufficient, fill gaps through whatever means fits the domain — explore code, search docs, ask the user what exploration can't reveal. Scope to what's relevant, not the entire domain.
What to assess (starting points — adapt to the task):
Convergence test: Can you generate failure scenarios that reference specific components, patterns, or conventions in this context? If yes, sufficient. If only generic failures, gaps remain.
What must be true: You know what type of task this is, what typically fails in that class, and those base-rate failures inform your failure mode coverage.
Ground the reference class in domain understanding — "refactor of a tightly-coupled module with no tests" is useful; "refactor" is too generic. The reference class should be specific enough that its failure patterns are actionable. Task file warnings are a source.
Convergence test: Can you name the reference class and its most common failure modes? Often satisfiable in a single assessment step.
What must be true: Failure modes have been anticipated with concrete scenarios, and each has a disposition — encoded as criterion, explicitly scoped out, or mitigated by approach. No dangling scenarios. Mental model alignment checked — your understanding of "done" matches the user's expectation.
Failure dimensions — starting lenses for generating scenarios when gaps exist. Use these and any others relevant to the task:
| Dimension | What to imagine |
|---|---|
| Technical | What breaks at the code/system level? |
| Integration | What breaks at boundaries? |
| Stakeholder | What causes rejection even if technically correct? |
| Timing | What fails later that works now? |
| Edge cases | What inputs/conditions weren't considered? |
| Dependencies | What external factors cause failure? |
Task files add domain-specific failure scenarios. Scenarios grounded in domain understanding are higher signal than generic templates.
Scenario disposition — every scenario resolves to one of:
The active interview mode defines how scenarios are presented and dispositions resolved.
Convergence test: Relevant failure dimensions considered, all scenarios have dispositions, and user confirms no major failure modes were missed.
What must be true: Load-bearing assumptions — what must go right for the task to succeed — are surfaced and each is resolved: verified, encoded as invariant, or logged as Known Assumption.
Where failure mode coverage asks "what broke?", positive dependencies ask "what held?" This reveals assumptions you haven't examined.
What to assess (starting points — the task may surface others):
The active interview mode defines how dependencies are presented and resolved.
Convergence test: Load-bearing assumptions surfaced and each has a disposition.
What must be true: Process self-sabotage patterns — decisions that look reasonable individually but compound into failure — are identified and resolved. Skip for simple tasks.
Common patterns (not exhaustive — the task may have its own):
For each pattern, resolve its disposition — add as Process Guidance, encode as verifiable Invariant, accept as low risk, or note it's already covered. The active interview mode defines how patterns are presented and resolved.
Convergence test: Tasks with scope-creep risk have process risks identified and resolved. Skip when the task is straightforward enough that process sabotage is unlikely.
Resolve interview style from --interview argument → default thorough.
Load the interview mode file for behavioral specifics:
thorough (default): read references/interview-modes/thorough.mdminimal: read references/interview-modes/minimal.mdautonomous: read references/interview-modes/autonomous.mdFollow the loaded interview mode's rules for question format, flow structure, checkpoint behavior, finding-sharing, and convergence for the remainder of this /define run.
Auto-decided items: When interview style causes an item to be auto-decided (agent picks recommended option instead of asking), encode it normally as INV/AC/PG with an "(auto)" annotation, AND list it in the Known Assumptions section with the reasoning for the chosen option.
Style is dynamic: The --interview flag sets the starting posture, not a rigid lock. Shift when the user's behavior signals a different mode. After a style shift, follow the new mode's rules from that point forward. Log any style shift to the discovery file.
Decisions lock through structured options — Questions that lock manifest content present 2-4 concrete options, one marked "(Recommended)". The messaging file defines the tool; the interview mode defines when and how.
Resolve all Resolvable task file structures — After reading task files, extract every Resolvable table and checklist (risk lists, scenario prompts, trade-offs) and log each. Items already resolved in conversation context are logged as - [x] RESOLVED (from context) with source — not re-probed. Remaining items are logged as - [ ] PENDING. Resolve each per the interview mode's decision authority, or skip with logged justification. Don't defer to synthesis — these are structural decisions that compound when missed.
Discoverable unknowns — search first — Don't ask the user about facts you could discover through exploration. Only ask when: multiple plausible candidates exist, searches yield nothing, or the ambiguity is about intent not fact.
Preference unknowns — ask early — Trade-offs, priorities, scope decisions cannot be discovered. Ask directly with concrete options and a recommended default.
Confirm before encoding — Exploration-discovered constraints require confirmation per the interview mode before becoming invariants. This does not apply to task-file quality gates and Defaults (auto-included per Domain Guidance rules).
Encode explicit constraints — User-stated preferences, requirements, and constraints must map to an INV or AC. Don't let them get lost in the interview log.
Probe for approach constraints — Beyond WHAT to build, ask HOW it should be done. Tools to use or avoid? Methods required or forbidden? Automation vs manual? These become process invariants.
Probe input artifacts — When input references external documents, determine whether they should be verification sources. If yes, encode as Global Invariant.
Discovery log — Write to /tmp/define-discovery-{timestamp}.md immediately after each discovery. The log is the source of truth — another agent reading only the log could resume the interview.
Seed with a Context Assessment before probing — what's already understood and what's missing:
## Context Assessment
ALREADY UNDERSTOOD:
- [x] RESOLVED (from context): [item] — [source]
GAPS IDENTIFIED:
- [ ] PENDING: [what's missing and why it matters]
The interview begins at the gaps. Before marking a coverage goal as met from context, verify with concrete evidence — vague confidence doesn't count.
Every actionable item gets logged with resolution status:
- [ ] PENDING — needs resolution- [x] RESOLVED — encoded as INV/AC/PG/ASM, confirmed, or answered- [~] SKIPPED — explicitly scoped out with reasoningRead full log before synthesis. Unresolved - [ ] items must be addressed first. This is a memento-pattern discipline — the model will skip it without explicit instruction.
Batch related questions — Group related questions into a single turn. Each batch covers a coherent topic area.
Convergence — The interview mode defines probing aggressiveness. Convergence requires all five coverage goal convergence tests passing, plus:
- [ ] items in the logLow-impact unknowns become Known Assumptions. User can signal "enough" to override.
Insights become criteria — Every discovery must be encoded as INV-G*, AC-*, or explicitly scoped out. Unencoded insights are aspirational, not enforced.
Automate verification — When a criterion seems to require manual verification, push back: suggest how it could be automated, or ask the user for ideas. Manual only as last resort or when user explicitly requests it.
Verification phases — Each criterion's verify block has an optional phase: field (numeric, default 1). The principle: group by iteration speed — faster feedback loops run first. Fast checks (agent reviewers, bash) stay in default phase. Slow checks (e2e tests, deploy-dependent) go in later phases. Manual verification goes last. Omit phase: for phase 1. Non-contiguous phases are valid.
After defining deliverables, probe for initial implementation direction. Skip for simple tasks with obvious approach.
Why "initial": Approach provides starting direction, not a rigid plan. Plans break when hitting reality—unexpected constraints, better patterns discovered, dependencies that don't work as expected. The goal is enough direction to start confidently, with trade-offs documented so implementation can adjust autonomously when reality diverges.
Architecture - Generate concrete options based on existing patterns. "Given the intent, here are approaches: [A], [B], [C]. Which fits best?" Architecture is direction (structure, patterns, flow), not step-by-step script. When a choice affects multiple deliverables, surface which deliverables depend on it and what would need to change if the choice proves wrong during implementation.
Execution Order - Propose order based on dependencies. "Suggested order: D1 → D2 → D3. Rationale: [X]. Adjust?" Include why (dependencies, risk reduction, etc.).
Risk Areas - Pre-mortem outputs. "What could cause this to fail? Candidates: [R1], [R2], [R3]." Each risk has detection criteria. Not exhaustive—focus on likely/high-impact.
Trade-offs - Decision criteria for competing concerns. "When facing [tension], priority? [A] vs [B]?" Format: [T-N] A vs B → Prefer A because X. Enables autonomous adjustment during /do.
When to include Approach: Multi-deliverable tasks, unfamiliar domains, architectural decisions, high-risk implementations. The interview naturally reveals if it's needed.
Architecture vs Process Guidance: Architecture = structural decisions (components, patterns, structure). Process Guidance = methodology constraints (tools, manual vs automated). "Add executive summary section covering X, Y, Z" is Architecture. "No bullet points in summary sections" is Process Guidance.
| File | Owns |
|---|---|
Interview mode files (references/interview-modes/) | Question format, flow structure, checkpoint behavior, finding-sharing, convergence aggressiveness |
Messaging files (references/messaging/) | Interaction tooling (which tool to use, format constraints) |
Task files (tasks/) | Domain-specific quality gates, risks, scenarios, trade-offs, defaults |
Amendment mode (references/AMENDMENT_MODE.md) | Rules for modifying existing manifests |
Execution mode files (../do/references/execution-modes/) | Verification loop behavior (how many cycles, whether to run verifier) |
Three categories, each covering output or process:
# Definition: [Title]
## 1. Intent & Context
- **Goal:** [High-level purpose]
- **Mental Model:** [Key concepts to understand]
- **Mode:** efficient | balanced | thorough *(optional, default: thorough — controls verification intensity during /do)*
- **Interview:** minimal | autonomous | thorough *(optional, default: thorough — recorded so --amend can inherit the original interview style)*
- **Medium:** local *(optional, default: local — currently only local is supported)*
## 2. Approach (Complex Tasks Only)
*Initial direction, not rigid plan. Provides enough to start confidently; expect adjustment when reality diverges.*
- **Architecture:** [High-level HOW - starting direction, not step-by-step]
- **Execution Order:**
- D1 → D2 → D3
- Rationale: [why this order - dependencies, risk reduction, etc.]
- **Risk Areas:**
- [R-1] [What could go wrong] | Detect: [how you'd know]
- [R-2] [What could go wrong] | Detect: [how you'd know]
- **Trade-offs:**
- [T-1] [Priority A] vs [Priority B] → Prefer [A] because [reason]
- [T-2] [Priority X] vs [Priority Y] → Prefer [Y] because [reason]
## 3. Global Invariants (The Constitution)
*Rules that apply to the ENTIRE execution. If these fail, the task fails.*
- [INV-G1] Description: ... | Verify: [Method]
```yaml
verify:
method: bash | codebase | subagent | research | manual
phase: "[numeric, optional, default 1 — higher phases run after lower phases pass]"
command: "[if bash]"
agent: "[if subagent]"
model: "[if subagent, default inherit]"
prompt: "[if subagent or research]"
```
## 4. Process Guidance (Non-Verifiable)
*Constraints on HOW to work. Not gates—guidance for the implementer.*
- [PG-1] Description: ...
## 5. Known Assumptions
*Low-impact items where a reasonable default was chosen without explicit user confirmation. If any assumption is wrong, amend the manifest.*
- [ASM-1] [What was assumed] | Default: [chosen value] | Impact if wrong: [consequence]
## 6. Deliverables (The Work)
*Ordered by execution order from Approach, or by dependency then importance.*
### Deliverable 1: [Name]
*[If multi-repo: specify repo scope]*
**Acceptance Criteria:**
- [AC-1.1] Description: ... | Verify: ...
```yaml
verify:
method: bash | codebase | subagent | research | manual
phase: "[numeric, optional, default 1]"
[details]
```
### Deliverable 2: [Name]
...
| Type | Format | Example | Used By |
|---|---|---|---|
| Global Invariant | INV-G{N} | INV-G1, INV-G2 | /verify (verified) |
| Process Guidance | PG-{N} | PG-1, PG-2 | /do (followed) |
| Risk Area | R-{N} | R-1, R-2 | /do (watched) |
| Trade-off | T-{N} | T-1, T-2 | /do (consulted) |
| Known Assumption | ASM-{N} | ASM-1, ASM-2 | /verify (audited) |
| Acceptance Criteria | AC-{D}.{N} | AC-1.1, AC-2.3 | /verify (verified) |
After writing the manifest, check the manifest's mode: field and load the execution mode file from ../do/references/execution-modes/ for the resolved mode (default: thorough). Follow the mode's "Manifest Verification (/define)" section for whether to run the manifest-verifier and how many cycles.
When running the verifier, pass only the file paths — no summary, framing, or commentary:
Invoke the manifest-dev:manifest-verifier agent with: "Manifest: /tmp/manifest-{timestamp}.md | Log: /tmp/define-discovery-{timestamp}.md"
The verifier returns CONTINUE or COMPLETE:
Repeat until COMPLETE or user signals "enough".
Do not add context, justification, or steering to the verifier invocation. The verifier sees what you may have missed; let it assess independently. When relaying verifier output, do not paraphrase, filter, or editorialize.
Digest the manifest into a scannable summary the user can approve at a glance. The summary answers "do you understand and agree with this plan?" — not "review every acceptance criterion." The manifest has the details; the summary is the human-readable version.
Voice: Plain language. No manifest codes (D1, AC-1.1, INV-G3), no YAML blocks, no structured-document vocabulary.
Default structure (adapt if the task calls for something different):
Include an ASCII architecture diagram when the task has multiple components with inter-component flow. Skip for single-deliverable tasks.
The test: If the summary reads like a compressed manifest, rewrite it. If it reads like something you'd say to a colleague, it's right.
Anti-patterns:
After presenting the summary, wait for the user's response. User responses mean:
Load the messaging file for the resolved medium:
local (default): read references/messaging/LOCAL.mdThe messaging file defines HOW to interact (tool, format, polling). The interview mode file defines WHAT to interact about (questions, flow, convergence).
The medium is encoded in the manifest's Intent section as Medium: <value> so downstream skills know the communication channel.
/define ends here. Output the manifest path and stop.
Manifest complete: /tmp/manifest-{timestamp}.md
To execute: /do /tmp/manifest-{timestamp}.md [log-file-path if iterating]
If this was an iteration on a previous manifest that had an execution log, include the log file path in the suggestion.