Help us improve
Share bugs, ideas, or general feedback.
From backend-skills
Generates structured work-brief Markdown files at docs/briefs/ from planning notes or task descriptions, with caveman-style body text and sections keyed to Conventional Commits types. Useful for creating executable handoff artifacts that route downstream coding agents without requester re-interview.
npx claudepluginhub buyoung/skills --plugin backend-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/backend-skills:task-brief-creator-cavemanThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Produce a structured work-brief Markdown document under `docs/briefs/` that a downstream coding agent (or a human engineer) can pick up and execute without re-interviewing the requester.
LICENSETHIRD_PARTY_NOTICES.mdexamples/01-pm-paste-feat.mdexamples/02-rough-typed-fix.mdexamples/03-halt-ambiguous.mdexamples/04-briefset-checkout-i18n.mdexamples/05-stage-4-walkthrough.mdexamples/06-caveman-style-feat.mdexamples/README.mdreferences/bloat-decomposition.mdreferences/briefset.mdreferences/caveman-style.mdreferences/stage-4-interview.mdreferences/template.mdreferences/work-types.mdscripts/validate_brief.pyscripts/validate_briefset.pyGenerate structured work-brief Markdown documents from planning notes, with sections keyed to Conventional Commits types for downstream coding agents.
Generates executable Markdown implementation plans for multi-step tasks from context briefs, resolving ambiguities, ordering dependencies, and enabling parallel worker execution.
Share bugs, ideas, or general feedback.
Produce a structured work-brief Markdown document under docs/briefs/ that a downstream coding agent (or a human engineer) can pick up and execute without re-interviewing the requester.
The brief is the handoff artifact. Its job is to shrink the cost of the first hour a coding agent spends on the task — routing it to the right files, fixing the behavior envelope, and pre-answering the questions that would otherwise bounce back to the requester.
The brief is an executable work instruction — nothing else. It is not a scope-control memo, not a discussion summary, not a planning note, not a background briefing, not a rationale document. Every section must answer "what does the coding agent do next?" — if a section reads like meeting minutes, negotiation history, or context prose, rewrite it until it routes to files, decisions, or verifiable outcomes. A brief that makes the downstream agent re-interview the requester is a failed brief, regardless of how polished it reads.
"Executable, not discursive" is a prose style rule, not a content reduction rule. It tells you how each bullet should read — direct, action-routing, no rationale prose. It does not tell you to drop distinct concerns, merge unrelated bullets, or summarize the input down to its highlights. A brief that omits a concern from the input is also a failed brief, because the downstream agent will silently miss it. Tight prose, full enumeration: short bullets are fine and encouraged, but every distinct concern from the input and the codebase review must land somewhere in the brief. Caveman tightens prose further; it never tightens enumeration.
This skill operates in one of two output modes:
references/briefset.md; long input, many files, or many related edit points alone never trigger briefset mode.Output-mode selection happens at Stage 1 alongside the ambiguity gate.
In briefset mode, follow the workflow below with the per-stage adaptations in references/briefset.md (parent template, naming, decomposition decision table, dual-validator save).
Stage 4 always runs as a user decision table — after codebase review, gather ambiguous or user-owned decisions and present them in a Markdown table with 순번, 내용, 수정 추천안, and 근거.
Codebase-resolvable technical facts are probed instead of asked, while product intent, scope, acceptance thresholds, sequencing, and ownership decisions are tabled for the user.
The decision table is the default Stage 4 behavior, not a separate mode.
See references/stage-4-interview.md for the full decision classification, codebase-precedence, and termination rules.
Load references only when their decision point arrives:
references/caveman-style.md before composing or patching saved brief prose.references/work-types.md during Stage 2 when the work type is not obvious or when the type changes downstream behavior.references/briefset.md during Stage 1 when multiple execution contexts are plausible.references/bloat-decomposition.md only after a candidate child brief is independently executable but still looks oversized or mixed.references/stage-4-interview.md before presenting user-owned decisions.references/template.md while composing the saved Markdown.Do not re-open every reference by habit. The goal is to keep the live context focused on the next decision the coding agent must make. Caveman style applies to saved brief prose only, not to planning, validation, or chat.
/task-brief-creator-caveman, or similar).This skill is the caveman variant of task-brief-creator.
The saved brief is written in caveman full mode — fragments, dropped articles, short synonyms — as a register transform of the normal-mode brief.
Other intensity levels (lite, ultra, wenyan-*) are out of scope; do not switch.
Caveman compresses register, not content. A caveman brief and its hypothetical normal-mode equivalent must carry the same set of facts, the same number of bullets, and the same enumerate depth per section. Shorter sentences, dropped articles, and short synonyms are the only changes. If the caveman version has fewer bullets than the normal-mode version would have, that is content compression — not a caveman transform — and is forbidden. There is no token-count target; the goal is a primitive-sounding register over a fully enumerated work instruction, not a smaller document.
Two hard rules — both absolute:
## Open Questions section, which stays in normal prose (questions are read by humans deciding what to clarify — see Auto-Clarity carve-outs below).What gets compressed inside the brief:
a / an / the) dropped.just / really / basically / actually / simply) dropped.[thing] [action] [reason]. [next].big over extensive, use over utilize, then over subsequently).What stays unchanged inside the brief (Auto-Clarity carve-outs):
Acceptance Criteria / Side Effect Checkpoints checklist items — caveman compression OK on the prose, but order-sensitive multi-step verification stays disambiguated.
If conjunction loss changes meaning, drop back to normal for that bullet.Reproduction steps where step order matters.Behavior Contract invariants and verification methods.Constraints legal / API-shape constraints.# [<type>] <title>, and the ## Work Type value (feat, refactor, …).- N/A — <reason> for type-conditional sections.## Open Questions section body.
Questions are read by humans deciding what to clarify; caveman fragments lose the nuance that makes a question answerable.
Write each bullet as a complete, naturally phrased question.When in doubt — if compression creates technical ambiguity — write that bullet in normal prose. Caveman never wins over correctness.
See references/caveman-style.md for the full conversion rules, section-by-section guidance, and good/bad examples.
| Field | Value |
|---|---|
| Directory | docs/briefs/ (relative to repository root) |
| Filename | YYYY-MM-DD-<type>-<slug>.md |
YYYY-MM-DD | Today's date in repository's local timezone |
<type> | Conventional Commits type (see references/work-types.md) |
<slug> | kebab-case short slug, ≤40 chars, derived from the brief title |
| Body format | Markdown, following references/template.md exactly |
Example filename: 2026-04-23-feat-global-hotkey-system.md
If docs/briefs/ does not exist, create it.
If a file with the same name already exists, append -v2, -v3, … until the path is unique — do not overwrite.
For briefset mode, the parent uses YYYY-MM-DD-briefset-<set-slug>.md and children use YYYY-MM-DD-<type>-<set-slug>-NN-<child-slug>.md.
See references/briefset.md for the parent template and naming rules.
The eight required H2 sections are: Work Type, Current State (As-Is), Desired Outcome (To-Be), Scope (with In Scope / Out of Scope H3s), Related Files / Entry Points, Side Effect Checkpoints, Acceptance Criteria, Open Questions.
Optional Constraints may appear between Scope and Related Files / Entry Points when task-specific constraints exist.
Three work types require an additional H2 section between Current State (As-Is) and Desired Outcome (To-Be):
fix → ## Reproductionperf → ## Baseline Measurementrefactor → ## Behavior ContractThese exist because the work type changes the downstream agent's behavior (reproduction-first, measurement-first, behavior-preservation), and the brief must carry the type-specific input that behavior depends on.
The escape hatch when the section legitimately has nothing concrete to capture is a single bullet - N/A — <reason>.
See references/template.md and references/work-types.md for the per-section guidance.
Bullet count is not capped. The rule is cohesion plus completeness, not brevity:
Before full codebase review, check whether the input contains enough signal to ground the brief. Use the four-anchor heuristic:
| Anchor | What it answers | Maps to |
|---|---|---|
| PROBLEM | What is wrong or what is changing? | § Current State (As-Is) |
| GOAL | What should be true when it's done? | § Desired Outcome (To-Be) |
| SCOPE | Where does this apply (module, feature area, user surface)? | § In/Out of Scope |
| TARGET | Which part of the system is touched (file, subsystem, layer)? | § Related Files / Entry Points |
Count how many anchors are derivable from the input. Derivable = a reasonable engineer could answer the anchor from the user's input without inventing intent.
3 or 4 anchors present → CONTINUE to Stage 2. Missing detail gets filled in Stage 3 via codebase review or Stage 4 via user questions.
PROBLEM + GOAL + SCOPE present, TARGET missing → run a narrow target probe before deciding.
Use at most a few rg / glob queries to find likely files, directories, routes, commands, or modules.
If a concrete entry point emerges, CONTINUE.
If not, HALT and ask the user for the target area.
2 or fewer anchors present → HALT. Respond in the user's chat language naming exactly which anchors are missing, and ask for more input. Do NOT proceed through Stages 2–6 on an underspecified input. The briefset-mode check below also waits — never split an underspecified input into multiple equally underspecified child briefs. Example halt messages:
English:
I can't ground the brief from this input alone. Missing — PROBLEM (what is being fixed or changed) and TARGET (which area / file / subsystem is touched). Can you paste the spec or add one or two lines?
Korean:
입력만으로는 브리핑 만들기 어려워. 다음이 아직 확인 안 돼 — PROBLEM(뭘 고치거나 바꾸는지)과 TARGET(어느 영역/파일/시스템을 건드리는지). 더 얹어줄래? 기획서 붙여넣거나 한두 줄 더 써주면 돼.
Why halt instead of guess: an underspecified brief is worse than no brief — the downstream agent commits to the wrong problem framing and the rework cost eats the whole savings. Pushing back early is cheaper than producing a confident-looking but wrong document.
Edge case — pasted spec that looks long but is content-light: word count is not a proxy for the four anchors. A 2,000-word product narrative without a concrete PROBLEM or TARGET still halts. Judge by anchor coverage, not length.
See examples/03-halt-ambiguous.md for a worked halt case.
Briefset signal check (after CONTINUE): once anchors clear, also evaluate whether the input describes multiple execution contexts. Do not use file count, line count, input length, or several related edit points as triggers by themselves. Those are supporting evidence only.
If briefset signals are strong, recommend briefset mode and ask the user to choose before Stage 2 instead of switching silently. Use the user's chat language and keep the question short. Korean example:
다중 브리프로 나누는 것이 권장됩니다. 실행 단위가 독립적이고 순서/병렬 조정이 필요해 보입니다. 어떻게 진행할까요?
- 다중 브리프로 생성
- 단일 브리프로 유지
If the user chooses briefset, continue with references/briefset.md.
If the user chooses single-brief, keep one cohesive brief and document the requested execution ordering in Constraints / Acceptance Criteria as needed.
If the evidence is unclear, default to single-brief mode and let Stage 4 surface the question.
Determine the Conventional Commits type.
Consult references/work-types.md for the full list and per-type behavior hints.
See references/work-types.md for the full type-confirmation routing table (explicit-agree / explicit-conflict / high-confidence implicit / low-confidence implicit).
The goal is enough context to fill Current State (As-Is) and Related Files / Entry Points, not exhaustive exploration.
Use whatever code search / read / symbol tooling fits the host environment — default Grep / Read / Glob, semantic tools where available (Serena MCP, ast-grep, language servers), or a short-lived subagent (e.g. Explore) when parallel lookups or main-context isolation is worth it.
Tool choice is the runtime's call; this stage only fixes the purpose and budget of the review.
Review budget (soft limits):
Open Questions so the downstream agent inherits them rather than having them silently dropped.Strategy:
Do not:
Current State (As-Is) or Related Files / Entry Points, skip it.Open Questions instead.After Stage 3 has gathered enough codebase context, collect the remaining ambiguous or user-owned decisions into a Markdown decision table. Stage 4 is not a pre-review guessing interview: ask only after the codebase has been checked enough to state the uncertainty, the recommended change, and the evidence behind it.
Use this exact table shape for user-decision questions:
| 순번 | 내용 | 수정 추천안 | 근거 |
|---|---|---|---|
| 1 | <decision the user must make> | <recommended change to apply to the brief> | <codebase/input evidence and risk> |
Keep these four headers exactly as written, even when the surrounding conversation is not Korean. They are the stable decision-table contract: number, decision content, recommended change, and rationale.
Required gaps to close before drafting:
Decision-table rule. Each row must be a real decision, not a vague status note.
내용 states what the user must decide.
수정 추천안 states the concrete brief change you recommend.
근거 cites the input, codebase finding, existing pattern, or risk.
After the user answers, patch the draft plan in memory before composing the brief.
Full decision classification, table rules, and termination rules live in references/stage-4-interview.md.
Once Stage 4 closes, compose the final Markdown internally and write it straight to disk — do not paste the full brief into chat first. The user reviews the file in their editor in Stage 6, where real markdown rendering and diff tooling are available.
Compute the filename per the Output Contract above.
Ensure docs/briefs/ exists; create it if not.
Resolve filename collisions by appending -v2, -v3, ….
Render the complete template from references/template.md and write the file (English section headers, English body in caveman full mode — see references/caveman-style.md for conversion rules and the Auto-Clarity carve-outs that stay in normal prose).
Run the structural validator — a fast smoke test for the template contract:
python3 skills/task-brief-creator-caveman/scripts/validate_brief.py \
docs/briefs/<filename>.md
The validator only checks structural conformity (section presence, checklist format, filename pattern, type coherence). It does not judge content quality — that's what the Stage 5.5 self-check and the human review in Stage 6 are for. Passing validator ≠ good brief; failing validator = malformed brief.
The structural validator confirms the file has the required sections. It does not confirm the file is a complete work instruction. Before handing off in Stage 6, re-read the saved brief from disk and run a content-coverage self-check against the original input plus Stage 3 / Stage 4 findings.
The brief is a work instruction, not a summary. Caveman compresses how the brief reads, never what it contains — so this self-check is identical to the normal-mode skill's check, plus one caveman-only parity item. Run this checklist:
Out of Scope as [hard] or [deferred], or in Open Questions when the user must decide.
Two unrelated implementation or verification obligations are never merged into one bullet.Related Files / Entry Points, and every uncertainty raised by the review either appears in Open Questions or was explicitly resolved during Stage 4.only on cold start, ≤ 5KB gzipped, iOS Safari 17+, after move end).
"Executable, not discursive" is a prose rule, not a content rule.If any check fails, fix the brief in place with Edit, then re-run scripts/validate_brief.py to confirm structural conformity still holds.
Loop the self-check until every item passes.
The self-check outcome is a separate signal from the structural validator — both are reported in Stage 6. A brief can pass structural validation and still fail this self-check; in that case the file is incomplete even though it is well-formed.
For briefset mode, run the self-check on the parent and on every child independently. The parent's coverage check asks whether every input-implied execution context maps to a child; each child's coverage check uses the same six items above.
The Stage 5.5 self-check is self-evaluated — the same agent that wrote the brief grades it for cold-pickup readiness. That is biased. An untouched sub-agent reading only the original input and the saved brief is the truthful version of the cold-pickup test.
Stage 5.6 runs unconditionally in both single-brief and briefset modes. This skill's contract is the explicit authorization to spawn the sub-agent — do not skip Stage 5.6 based on host-environment defaults like "only spawn sub-agents when the user explicitly requests one", "be conservative about sub-agent cost", or "don't run extra verification unless asked". The user invoking this skill IS the explicit request. The only valid reasons to skip Stage 5.6:
## Cold-Pickup Verification (Stage 5.6) section below.Any other reason — token budget, latency, "the brief looks fine", inferred host policy — is not a valid skip reason. If Stage 5.6 is skipped without one of the two valid reasons, the Stage 6 banner is wrong and the loop is broken.
Mechanism:
Spawn an Explore or general-purpose sub-agent.
Hand it only the original user input or planning notes plus the brief path — no Stage 3 uncertainty register, no Stage 4 decisions, no suspected gaps, no decomposition rationale, and no Stage 5.5 result. Do not include hints such as what to inspect, what might be missing, or which split you expect the sub-agent to prefer. For briefset mode, hand the parent and every child path one at a time; each file runs its own cold-pickup pass.
Ask the sub-agent to return the YAML report below (the caveman variant adds over_terse_bullets on top of the standard schema).
Free-form prose is not accepted — the report is parsed deterministically.
verdict: clean | needs_changes | blocked
first_actions:
- <file to open, search to run, or hypothesis to test — one bullet each>
ask_backs:
- id: a1
question: <what it would ask the requester before starting>
evidence: "<direct quote from the brief or the original input>"
source_of_uncertainty: user_input_ambiguity | unverifiable_fact | minor_default
affects_direction: true | false
missing_concerns:
- id: m1
description: <concern absent or specified too thinly>
evidence: "<direct quote from the original input>"
over_terse_bullets:
- id: t1
bullet: "<direct quote of the bullet from the brief>"
reason: <why caveman compression made intent ambiguous>
Rules enforced on the sub-agent:
ask_backs[*], missing_concerns[*], and over_terse_bullets[*] must include a direct-quote evidence / bullet. Paraphrases are not accepted; if no quote applies, drop the item.ask_backs[*] must classify source_of_uncertainty:
user_input_ambiguity — the input is ambiguous; the brief picked one interpretation but others are equally reasonable.unverifiable_fact — an external fact (API behavior, library version, data shape) the sub-agent cannot confirm from the two inputs alone.minor_default — a reasonable default for something the user did not specify; alternative values would not change the brief's direction.verdict: clean is only valid when ask_backs, missing_concerns, and over_terse_bullets are all empty.Diff the YAML report against the original input + Stage 3 uncertainty register + Stage 4 decisions.
Drift handling. When the report's verdict is needs_changes or blocked, or when any unrejected ask_backs / missing_concerns / over_terse_bullets survive routing — Edit the saved brief in place to close the gap, re-run validate_brief.py, and re-run cold-pickup.
Loop until a termination trigger fires (see below) or the hard cap of 5 passes is reached.
Termination triggers (evaluated in priority order at the end of every pass):
| # | Trigger | Category | Definition | Action |
|---|---|---|---|---|
| 1 | Regression | Defensive | This pass's report has more unrejected ask_backs + missing_concerns + over_terse_bullets than the previous pass. | Roll back the brief to the previous pass's saved version, stop. |
| 2 | Oscillation | Convergence | The same finding has been accepted → rejected → accepted (or vice versa) across passes (uses the rejection log from routing). | Adopt the brief from the pass where the oscillating finding was last rejected, stop. |
| 3 | Stable findings | Convergence | The set of unrejected ask_backs + missing_concerns + over_terse_bullets is semantically identical to the previous pass (yes/no judgement — no similarity scores; if ambiguous, treat as not-equivalent and continue). | Stop. Surface residuals as Stage 6 comments. |
| 4 | Clean pass | Positive | verdict: clean with empty ask_backs, missing_concerns, and over_terse_bullets. | Stop. Adopt the current brief. |
| 5 | No-op pass | Convergence | Routing produced zero accepted items this pass (everything rejected as disagreement / scope / weak evidence). | Stop. Adopt the current brief. |
| 6 | Hard cap | Fallback | Pass count reached 5. | Stop. Surface residuals as Stage 6 comments. |
Regression is evaluated first because rolling back must outrank optimistic "one more pass might help" instinct. Hard cap is the fallback — not a preferred outcome.
Pass condition (normal termination): trigger 4 (Clean pass). Triggers 1, 2, 3, 5, 6 stop the loop but signal residual concerns that Stage 6 must surface.
Routing ask_backs. Classify before deciding to patch:
source_of_uncertainty | affects_direction | Action |
|---|---|---|
user_input_ambiguity | true | Surface in Open Questions for the user — chat-only while the brief is in flight, decision-table row when already saved. Never invent the answer in Edit. |
user_input_ambiguity | false | State the default assumption in the relevant section; patch in place. |
unverifiable_fact | (any) | Main verifies directly (codebase check, doc read) or rewrites the bullet as a hedge. Never ask the user — this is the main agent's job. |
minor_default | (any) | Patch in place with the assumption stated. |
over_terse_bullets[*] are caveman-register findings; they never match a Stage 4 row 내용, so they are always treated as drift — patch in place under the Auto-Clarity carve-out, never as disagreement.
Disagreement vs drift. The sub-agent sees the original input and the brief but not the Stage 3 register or Stage 4 decisions, so it cannot know which items the user locked.
Before applying the routing table above, if an ask-back's subject matches the 내용 of a row in the Stage 4 decision table the user already answered, treat it as disagreement — chat-only comment, no patch.
Otherwise route per the table.
Cold-pickup never overrides user decisions, never invents Acceptance Criteria, never silently rewrites Open Questions.
Reporting. The cold-pickup outcome integrates into the Stage 6 save banner alongside the structural validator and the Stage 5.5 self-check.
For briefset mode, the banner uses the collapsed parent + K/N children format defined under ## Cold-Pickup Verification (Stage 5.6) below — one summary line plus details only on flagged children, not one line per child.
The brief is on disk. Hand off to the user for review.
Report the path, a one-line summary (work type + title), the structural validator result, the Stage 5.5 self-check result, and the Stage 5.6 cold-pickup result. Use the user's chat language. All three signals are reported together so the user can see whether the file is well-formed, complete, and cold-pickup-ready.
English (validator + self-check + cold-pickup passed):
Saved —
docs/briefs/2026-04-23-feat-dark-mode-settings.md(feat: Dark mode toggle in Settings; structural validation passed; content self-check passed — N input concerns covered, caveman parity OK; cold-pickup terminated withclean_passafter 1 pass (no ask-backs, no missing concerns, no over-terse bullets)). Open it and let me know if anything needs editing.
Banner termination trigger reflects the actual loop outcome — clean_pass (normal), regression, oscillation, stable_findings, no_op, or hard_cap. Any non-clean_pass trigger means residual concerns must follow in the banner as bullet items.
Korean (validator + self-check + cold-pickup passed):
저장 완료 —
docs/briefs/2026-04-23-feat-dark-mode-settings.md(feat: Dark mode toggle in Settings; 구조 검증 통과; 내용 자체 검증 통과 — 입력 항목 N개 모두 매핑, 문체 변환 동등성 확인; cold-pickupclean_pass로 1회 만에 종료 (ask-back 없음, missing 없음, 과압축 지적 없음)). 파일 열어보고 고칠 부분 있으면 알려줘.
English (validator failed):
Saved —
docs/briefs/2026-04-23-feat-dark-mode-settings.md, but the structural validator flagged 2 issue(s): ✗ ✗ The file is on disk. Want me to patch these, or will you edit directly?
Korean (validator failed):
저장 완료 —
docs/briefs/2026-04-23-feat-dark-mode-settings.md, 다만 구조 검증에서 2건 지적: ✗ <첫 번째 실패 메시지 그대로> ✗ <두 번째 실패 메시지 그대로> 파일은 디스크에 있음. 내가 패치할까, 직접 고칠래?
When the structural validator fails, Stage 5.5 and Stage 5.6 are skipped — the brief is not yet well-formed enough to run content or cold-pickup checks against.
The banner stays as shown; do not append self-check skipped / cold-pickup skipped lines in this case.
If the structural validator passed but the Stage 5.5 self-check surfaced gaps that you fixed in place, mention what you patched so the user knows the brief was tightened before handoff (e.g., "self-check found 2 input concerns missing from In Scope and one bullet that had been merged for caveman compression; restored them, re-validated").
If Stage 5.6 patched the brief after cold-pickup drift, report it the same way (e.g., cold-pickup flagged 2 gap(s) and 1 over-terse bullet; patched in place).
If the user opted out, report cold-pickup skipped per user request.
If the user requests changes, apply them with Edit against the on-disk file.
Do not re-render the full brief into chat — that defeats the point of save-then-review.
Re-run the validator after each edit pass and report the delta.
If the saved single brief contains Open Questions that require a user decision, present them immediately after the save report using the same four-column decision table from Stage 4:
| 순번 | 내용 | 수정 추천안 | 근거 |
|---|---|---|---|
| 1 | <Open Question requiring user decision> | <recommended patch to apply to the brief> | <why this cannot be delegated safely> |
After the user answers, patch the saved brief in place, move resolved questions into the appropriate sections, leave only genuinely unresolved or delegated questions in Open Questions, and re-run the validator plus Stage 5.5 self-check.
Chat stays normal prose; only saved brief body prose uses caveman full mode.
The user owns "done." Do not stage or commit the file. Loop on Stage 6 until they explicitly stop.
Why save-then-review: an earlier iteration rendered the full brief in chat for approval before writing to disk.
In hands-on use that flooded the conversation with markdown that renders poorly inside a code fence and was awkward to edit conversationally.
Writing to disk first lets the user review in their editor (real markdown, real diff tools, real inline edits) and lets the validator surface structural issues immediately.
The tradeoff — a file briefly on disk before approval — is neutral: docs/briefs/ is the intended home for these files, and the commit step stays with the user.
See references/template.md for:
The emitted brief is in English. Chat interaction language follows the user's input.
See references/work-types.md for:
See examples/ for worked end-to-end scenarios (input → codebase review → interview → output).
Start with examples/README.md for the index.
scripts/validate_brief.py is a stand-alone Python 3 script (no external deps) that verifies structural conformity of a saved brief.
It is wired into Stage 6 but can also be run ad-hoc against any existing brief:
python3 skills/task-brief-creator-caveman/scripts/validate_brief.py \
docs/briefs/2026-04-23-feat-global-hotkey-system.md
Exit codes: 0 pass, 1 structural failure, 2 file I/O error.
Scope of the validator (deliberately structural only):
In Scope / Out of Scope H3s.Reproduction / Baseline Measurement / Behavior Contract) present and populated for the matching type.- [ ] format in checklist sections; populated Open Questions with - None — <reason> when no questions remain.Related Files / Entry Points resolve on disk (skipped when the bullet carries a (proposed) marker).Constraints heading shape.Out of Scope bullets without [hard] or [deferred] classification.
The validator does not judge whether the classification is semantically correct.Out of scope (still on the human): concreteness of bullets, whether Out-of-Scope entries are real guardrails vs. filler, whether entry points are good (the path-existence check only catches fabricated paths, not poorly-chosen ones), whether Acceptance Criteria are measurable, whether the type-conditional section's content is sufficient.
For briefset mode, use scripts/validate_briefset.py on the parent file — it validates the parent structure and re-runs validate_brief.py's checks transitively on every referenced child brief, so one invocation covers the whole set:
python3 skills/task-brief-creator-caveman/scripts/validate_briefset.py \
docs/briefs/2026-04-30-briefset-checkout-i18n.md
Same exit codes.
See references/briefset.md for what the parent validator checks and what stays on the human reviewer.
Stage 5.6 spawns an Explore or general-purpose sub-agent that reads only the original user input or planning notes plus the saved brief, then returns the YAML report defined in Stage 5.6 (verdict, first_actions, ask_backs with evidence + source_of_uncertainty + affects_direction, missing_concerns with evidence, and the caveman-specific over_terse_bullets).
The main agent classifies and routes each ask_backs[*] per the routing table in Stage 5.6, treats every over_terse_bullets[*] as drift (Auto-Clarity carve-out), patches the brief in place if drift survives routing, and re-runs the structural validator.
No numeric confidence score is used — verdict: clean (with empty ask_backs, missing_concerns, and over_terse_bullets) is the pass condition.
Default behavior: unconditionally ON. Cold-pickup runs automatically after Stage 5.5 passes, in both single-brief and briefset modes. The skill invocation itself is the authorization — host-level "only spawn sub-agents on explicit request" defaults do not override this. The only valid skip paths are user opt-out (below) or a Stage 5 structural-validator failure.
Opt-out. The user can skip Stage 5.6 with any of:
skip cold-pickup, cold-pickup off, no cold-pickup, 콜드픽업 건너뛰기, 콜드픽업 끄기, cold-pickup 생략.--no-cold-pickup or equivalent.When the user opts out, Stage 5.6 is bypassed cleanly and the Stage 6 save banner reports cold-pickup skipped per user request.
Loop cap. A maximum of 5 cold-pickup passes per brief. The loop terminates earlier on any of the triggers defined in Stage 5.6 (Regression, Oscillation, Stable findings, Clean pass, No-op pass). If the hard cap fires, surface the residual gaps in Stage 6 as comments for the user rather than continuing to patch.
Briefset cost note. In briefset mode the total spawn count is parent + N children, multiplied by up to 5× in the worst case when every file hits the hard cap.
In practice most files terminate earlier (Clean pass on pass 1, or Stable findings / No-op on pass 2–3), so the average is closer to 1.5×–2×.
For a wide briefset (≥ 5 children) this is still the most expensive Stage 5.6 case — recommend the user opt out for that briefset, or run Stage 5.6 only on the parent and a sample of children, when cost matters.
Briefset reporting (Stage 6 banner). Per-child cold-pickup status is collapsed to one summary line plus details only on flagged children, not one line per child:
cold-pickup: 1/1 parent + N/N children verdict:clean (no ask-backs, no missing concerns, no over-terse bullets).cold-pickup: 1/1 parent clean, K/N children clean, M flagged — see chat for details, then list the flagged child paths and the specific drift items below.Caveman extension. The sub-agent prompt adds one item to the standard four — are any bullets so terse they hide intent?. Caveman is a register transform; if compression made a bullet ambiguous, the sub-agent flags it and the bullet is rewritten in normal prose under the Auto-Clarity carve-out before the brief passes Stage 5.6.
What cold-pickup never does:
Open Questions — drift fixes either resolve a question into another section or leave the question intact for the user.Related Files / Entry Points is mandatory because it is the downstream agent's starting route.
If the codebase review does not surface at least one concrete file, directory, route, command, module, related brief, or confirmed proposed path, ask the user to provide or confirm the entry point before saving the brief.PaymentService interface" is a real guardrail.Out of Scope tells the downstream coding agent what not to do.
Put bounded implementation choices in Constraints, and user-owned unresolved choices in Open Questions.references/briefset.md).
Briefset mode is the supported way to handle multi-context work — do not stuff multiple unrelated tasks into a single brief unless the user explicitly chooses single-brief after the recommendation, and do not nest briefsets (a child cannot become a parent).references/stage-4-interview.md for the table rules and termination conditions.Self-check before invoking Write in Stage 5.
The structural validator catches format errors after the fact; this list catches content gaps it cannot see.
YYYY-MM-DD-<type>-<slug>.md.<type> is one of the ten Conventional Commits types.[<type>].Current State (As-Is) and Desired Outcome (To-Be) are both populated and distinguishable.fix / perf / refactor, the type-conditional section (Reproduction / Baseline Measurement / Behavior Contract) is present and populated — - N/A — <reason> if genuinely none.Out of Scope has at least one specific entry (or an explicit "None — self-contained." with rationale).
Use [hard] for must-not-touch guardrails and [deferred] for follow-up work when the distinction matters.Acceptance Criteria are measurable (checkable, not aspirational).Related Files / Entry Points entries are existing repo paths, verified references, or confirmed proposed paths.
Paths under inline-code that are not yet created carry a (proposed) marker so the structural validator does not flag them as fabricated.
Each entry routes the agent's first read or first edit, not just "related file" context.Open Questions uses - None — <reason> only if the brief is genuinely unambiguous; otherwise populate it with real questions.## Cold-Pickup Verification (Stage 5.6)). A silent skip on cost / host-policy grounds is not acceptable — the Stage 6 banner must reflect what actually ran.## Open Questions is in caveman full mode (articles/filler/pleasantries dropped, fragments OK, short synonyms).## Open Questions bullets stayed in normal prose — each question is complete, naturally phrased, and unambiguous.## Work Type value were not caveman-rewritten.- N/A — <reason> escape hatch (where used) keeps the literal N/A — token; only the reason after the em dash is caveman.