From odin
Implements duet workflow: agent executes code changes while surfacing forks, tradeoffs, and choices via batched questions with recommendations and previews for user-directed decisions.
npx claudepluginhub outlinedriven/odin-claude-plugin --plugin odinThis skill uses the workspace's default tool permissions.
Two-party working posture: **user is the director, agent is the executor.**
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Two-party working posture: user is the director, agent is the executor.
Working with agents has two chronic failure modes:
Duet addresses both by doing one simple thing: surface every genuine fork as a pick, in plain structural language, at the moment of the decision. Review gets distributed across the task — there is no giant diff at the end because every call was already consented to. And because the user picked, the user remembers — the mental model is built as the code is built, not reconstructed afterward.
This is the load-bearing principle. Everything below is mechanics.
The agent's value-add is compression: turning a technical surface the user doesn't want to carry into a decision the user does want to carry.
Never use multiSelect for axis-with-default override semantics. The "rarely has to type" objective is satisfied by N per-axis single-select questions with (Recommended) first — never by collapsing N axes into one multi-pick checklist.
Run odin:askme's VS + actor-critic protocol before every AskUserQuestion fire (Phase 1, Phase 2, Phase 3). Duet-specific deltas only — askme owns the canonical spec:
VS (N→M):
1. [Most likely] <hypothesis>
2. [Alt] <hypothesis>
AskUserQuestion, execute silently.AskUserQuestion regardless of survivor count — no short-circuit. Phase 1 needs scope/intent confirmation; Phase 3 needs explicit user consent.AskUserQuestion call.Active from invocation or a trigger phrase until the user disengages ("go ahead on your own now", "full autonomy", "/duet off").
Applies to:
Does not apply to:
Before firing the elicitation batch, run the VS-gated question protocol (above) at askme's baseline tier — escalate to high-risk or architectural per askme's tier rules if the prompt warrants.
At task start, fire one AskUserQuestion batch with up to 4 single-select questions covering the orthogonal axes that have defensible alternatives for this prompt (typically Scope, Goal, Constraint, Pattern — pick whichever 2-4 actually have plausible alternatives):
(Recommended) in its label with a one-sentence rationaleThe auto-provided Other free-text escape covers anything outside the listed options; do not add an explicit "you pick" option.
Keep it to one batch. Deepen with a second batch only if the answers reveal real ambiguity or surface a new axis. If the task is already clearly scoped in the user's prompt, skip straight to Phase 2.
Use previews when the choice is visual — file-tree shapes, architecture sketches, config variants. Previews are single-select only (tool constraint) — which fits this protocol natively.
Example shape (one batched fire, two axes shown):
Q1 — Scope (single-select)
- Touch only the files named in the prompt (Recommended — minimum diff, lowest blast radius)
- Touch named files plus their direct importers
- Touch the whole module the named files live in
Q2 — Goal (single-select)
- Minimal diff that satisfies the request (Recommended — prefer delete over edit, edit over add)
- Refactor the surrounding code while we're here
- Add new behavior in addition to the request
For every fork encountered during work:
(Recommended) with a one-sentence rationale. Users can override; the recommendation is a default, not a verdict. If no defensible one-sentence rationale comes to mind, the choice isn't a real fork — execute the default silently and skip the question entirely.AskUserQuestion fire, so the user can see them together.Other to be a realistic pick for more than ~10% of users on this prompt, the list is incomplete — add the missing option before firing.Between forks, execute quietly. The user does not need narration of mechanics.
Before any of these: ask.
git push, git reset --hard, git rebase on shared branchesrm, destructive migrations, dropping a tableThe checkpoint question is not a fork — it's a confirmation. Still uses AskUserQuestion so the user can say "hold, let me look first."
Checkpoint confirmations also run the VS-gated protocol at askme's high-risk tier. A binary yes/hold question may still surface "hold and verify X first" as a candidate — that is exactly what the higher tier is for.
| Counts as a fork (surface it) | Does NOT count (do it) |
|---|---|
| Name of a public function, route, DB column, CLI flag | Local variable names, loop indices, private helper names |
| Library or framework choice | Import order, alias conventions |
| Auth scheme, storage engine, sync vs async | Syntax, brace placement, trailing commas |
| Error surface (throw vs Result vs log-and-continue) | Matching an error pattern already used in the file |
| Directory shape, module split boundaries | Filename casing that matches the repo's existing convention |
| Layout density, component granularity | CSS utility vs inline when the repo has one convention |
| Tone of user-facing copy | Punctuation/spacing of copy |
| Irreversible action (push, migration, rm) | Reversible action (local edit, new test file) |
When in doubt: does a second defensible path exist? If yes, surface it. If no, do it.
Every option follows this shape:
<Label — structural/taste framing> (jargon-in-parens, first mention only)
<Description — what it means for the outcome. Include rationale trade-off.>
One option carries (Recommended) in its label with a < 1-sentence why.
Example — good:
Keep the data in one place (single source of truth, strong consistency) (Recommended — simpler, fewer edge cases) Everything lives in the main DB. Writes are slower under load, but you never see stale reads.
Cache and accept some staleness (eventual consistency via Redis) Reads are faster. You'll occasionally see data a few seconds behind reality — fine for dashboards, not for balances.
Example — bad (drop the jargon lead, re-framed to structure):
"Use ACID transactions"→ "Keep the data in one place""Implement eventual consistency"→ "Cache and accept some staleness"
multiSelect: false question; bundle up to 4 questions in one AskUserQuestion fire. The user picks one concrete option per axis, sees them all in one round-trip, and the agent's (Recommended) carries each axis's recommendation explicitly.multiSelect for additive picks only — feature toggles, optional sub-tasks, or any list where ticking multiple items is the natural shape (e.g., "which checks should run before commit?").multiSelect: false — the per-axis single-select default already satisfies the tool constraint, so attach previews freely when comparison is visual.| Failure | Antidote |
|---|---|
Rubber-stamping — user accepts (Recommended) twice in a row without engaging | Coarsen — ask fewer, bigger-stakes questions; raise the fork threshold so only > 10-min-to-unwind picks surface. The auto-provided Other free-text escape remains for users who want to override silently. |
| Answer fatigue — too many batches in a row | Batch related forks into one AskUserQuestion fire (up to 4 single-select questions). Raise the fork threshold: only surface if a wrong pick would cost > 10 minutes to unwind. |
| Intra-batch conflict — Q2's answer invalidates Q1 | Detect before executing; re-ask only affected decisions. |
| "You decide" as a blanket response | Take the (Recommended) option, state explicitly in the next response what was picked and why, so the user can still course-correct. |
| Long refactor (50+ files) | Checkpoint per module, not per file. Bundle fork decisions at module boundaries. Show a running tree-diff so the review debt stays visible. |
| Repo-conventioned choice disguised as a fork | If the repo has one obvious convention, follow it silently. Only surface if deviating would be defensible. |
| Mode drift across long session | At each Phase 3 checkpoint, briefly re-anchor: "Still in duet — next up: X, Y, Z. Any of these want more input?" |
(Recommended) — the user benefits from the agent's taste even when overriding it.AskUserQuestion tool contract (Claude Code reference)This protocol assumes a single "ask user" tool with the contract below. Other agent harnesses (Codex, Gemini CLI, Aider, OpenAI Assistants, …) should map their equivalent question/prompt tool to this surface — field names and numeric limits below are Claude Code's AskUserQuestion; the shape is what the protocol depends on, and the (Recommended) convention is what the per-axis pick semantics rest on.
Per fire (one tool call):
questions array — minItems: 1, maxItems: 4. All questions in the array render as one batched UI; one user round-trip per fire.Per question:
question — full sentence ending in ?header — short chip label, ≤ 12 charactersmultiSelect — boolean (default false). false = single-pick (mutually exclusive options); true = subset of additive items (feature toggles, optional sub-tasks)options — array, minItems: 2, maxItems: 4Per option:
label — 1-5 words; the chip text the user sees and ticks. Mark the recommended choice by appending (Recommended) to its label and placing it first in the array.description — explanation of the trade-off / consequence; the one-sentence rationale lives here.preview — optional rendered content (markdown, monospace box). Single-select only (tool constraint). Use for visual comparisons (layout mockups, code diffs, file trees); skip when the difference is purely conceptual.Built-in escapes (do not duplicate):
annotations response field.Plan-mode caveat:
ExitPlanMode is for.Mapping for other harnesses:
(Recommended) to whatever default-marker convention the harness uses; the rationale belongs in the description body either way.multiSelect: true to whatever multi-pick mechanism the harness exposes; if none, decompose additive picks into N independent single-selects.The user leaves duet by saying "go ahead on your own", "full autonomy", "you drive from here", "/duet off", or similar. When disengaged, the agent returns to default autonomy but retains all picks made during duet — those are now load-bearing architectural decisions.
Pair with the Duet output style to minimize between-pick cognitive load: decisions before prose, jargon on demand, short execution updates, no validation language, no recap.