Skill

feature

Drives TDD-first feature development for Python projects: crystallize API as a demo test, implement to pass, run quality stack and progressive review loop.

Python

Pytest

testing

backend

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/develop:feature <goal> [--repo <owner/repo>] [--plan <path>] [--no-challenge] [--challenge] [--no-codemap] [--codemap] [--semble] [--team] [--accept-no-plan] [--keep "<items>"]

User invocable

Model invocation disabled

Inline context

Effort: high

Argument hint

<goal> [--repo <owner/repo>] [--plan <path>] [--no-challenge] [--challenge] [--no-codemap] [--codemap] [--semble] [--team] [--accept-no-plan] [--keep "<items>"]

Tool Access

This skill is limited to the following tools:

ReadWriteEditBashGrepGlobAgentSkillTaskListTaskCreateTaskUpdateAskUserQuestionWebFetch

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

templates/report-templates.mdtemplates/team-spawn-prompts.md

SKILL.md

671 lines · ~9.9k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Parent stars23

Parent forks3

MaintenanceExcellent

Last CommitJul 14, 2026

Actions

View Source View Plugin View on GitHub View README

Agent Resolution

_PATHS=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/dev_shared_resolve.py" --foundry 2>/dev/null)  # timeout: 5000
_DEV_SHARED=$(echo "$_PATHS" | head -1)
_FOUNDRY_SHARED=$(echo "$_PATHS" | tail -1)
# loads: compaction-contract.md

Read $_DEV_SHARED/agent-resolution.md. Contains: foundry check + fallback table. If foundry not installed: substitute each foundry:X with general-purpose per table. Agents this skill uses: foundry:sw-engineer, foundry:qa-specialist, foundry:doc-scribe, foundry:linting-expert, foundry:challenger.

Read $_DEV_SHARED/task-hygiene.md.

Project Detection

Read $_DEV_SHARED/runner-detection.md — sets $TEST_CMD (full suite) and $PYTEST_CMD (pytest flags). Run at skill start.

Language preflight gate: after runner-detection.md, check project type:

# timeout: 5000
if [ ! -f "pyproject.toml" ] && [ ! -f "setup.py" ] && [ ! -f "setup.cfg" ]; then
    NON_PY=$(ls package.json Cargo.toml go.mod 2>/dev/null | head -1)
fi

If NON_PY non-empty: invoke AskUserQuestion — "Non-Python project detected ($NON_PY present, no pyproject.toml/setup.py). This toolchain assumes pytest. How to proceed?" · (a) Abort — use language-native toolchain · (b) Continue — I know what I'm doing (project has Python). On Abort: stop.

Monorepo language-target gate: if NON_PY empty (Python markers found) but non-Python markers also exist, confirm target language:

# timeout: 5000
MULTI_LANG=false
[ -f "pyproject.toml" ] && [ -f "package.json" ] && MULTI_LANG=true
[ -f "pyproject.toml" ] && [ -f "go.mod" ] && MULTI_LANG=true
[ -f "pyproject.toml" ] && [ -f "Cargo.toml" ] && MULTI_LANG=true

If MULTI_LANG=true: invoke AskUserQuestion — "Monorepo detected (Python + non-Python markers coexist). This skill targets Python/pytest. Is the feature you're building Python-only?" · (a) Yes — Python only — proceed · (b) No — involves non-Python code too — abort; use a language-native toolchain for the non-Python portion. On (b): stop.

Optional --plan <path>: if $ARGUMENTS contains --plan <path> (at any position), read plan file first. Extract Affected files, Risks, Suggested approach — use to populate Step 1 analysis instead of cold codebase exploration. Skip agent feasibility re-check (already done in /develop:plan). Store plan path as PLAN_FILE.

Read $_DEV_SHARED/preflight-helpers.md — execute --plan path extraction; sets $PLAN_FILE.

Checkpoint init: run DEV_DIR=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/dev_run_dir.py" 2>/dev/null) # timeout: 5000 to create .developments/<TS>/ and capture path. Write checkpoint.md inside $DEV_DIR. After each major step (1, 2, 3, 4, 5), append step: N — completed to $DEV_DIR/checkpoint.md. On skill start, check for existing .developments/*/checkpoint.md — if found, offer to resume from last completed step.

# persist DEV_DIR for compaction recovery — bash state lost between Bash() calls  # timeout: 5000
echo "$DEV_DIR" > "${TMPDIR:-/tmp}/dev-feature-dev-dir"

Flag parsing

Parse flags into actual shell variables (not prose) so downstream blocks see correct values. Persist to temp files for cross-block access (bash state lost between Bash() calls):

KEEP_ITEMS=""
if [[ "$ARGUMENTS" =~ --keep[[:space:]]\"([^\"]+)\" ]]; then
    KEEP_ITEMS="${BASH_REMATCH[1]}"
fi
echo "$KEEP_ITEMS" > "${TMPDIR:-/tmp}/dev-feature-keep-items"
rm -f .claude/state/skill-contract.md  # timeout: 5000

# timeout: 10000
python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/dev_parse_args.py" \
    --skill feature --write-files "$ARGUMENTS"

Downstream blocks read back, e.g. TEAM_MODE=$(cat ${TMPDIR:-/tmp}/dev-team-mode 2>/dev/null || echo false).

# timeout: 6000
ISSUE_REF=""
[[ "$ARGUMENTS" =~ --issue[[:space:]]+([^[:space:]]+) ]] && ISSUE_REF="${BASH_REMATCH[1]}"
echo "$ISSUE_REF" > ${TMPDIR:-/tmp}/dev-issue-ref
if [ -n "$ISSUE_REF" ]; then
    REPO_NAME=$(cat ${TMPDIR:-/tmp}/dev-upstream 2>/dev/null || echo "")
    if [ -n "$REPO_NAME" ]; then
        gh issue view "$ISSUE_REF" --repo "$REPO_NAME" 2>/dev/null || echo "⚠ Could not fetch issue $ISSUE_REF from $REPO_NAME — proceeding without issue context"
    else
        gh issue view "$ISSUE_REF" 2>/dev/null || echo "⚠ Could not fetch issue $ISSUE_REF — proceeding without issue context"
    fi
fi

If ISSUE_REF non-empty and issue fetch succeeded: include issue title, body, and labels in Step 1 scope analysis as pre-populated requirements context.

Cross-repo adaptation (when REPO_NAME set) — issue filed against different codebase. After fetching issue, Step 1 scope analysis must also:

Extract intent from issue — what problem does it solve in abstract terms, not just described implementation details (which assume upstream's structure)
Check local divergences: run git log --oneline -10 and grep for symbols mentioned in issue; identify where local codebase differs structurally from what issue assumes
Produce adaptation plan: upstream intent → local implementation using local conventions, existing abstractions, and current code structure — never assume upstream approach ports directly

Unsupported flag check — after ALL supported flags extracted (including --issue from block above), scan $ARGUMENTS for remaining --<token> tokens not in supported list. Do NOT include --issue in "unknown" set — it is consumed in second parse block above. Supported: --plan, --team, --no-challenge, --challenge, --no-codemap, --codemap, --semble, --accept-no-plan, --issue, --repo, --keep. If truly unknown token found: print ! Unknown flag(s): \--`.then invokeAskUserQuestion` — (a) Abort (stop, re-invoke with correct flags) · (b) Continue ignoring (skip unknown flags, proceed). On Abort: stop.

Codemap auto-detection — run after flag parsing; reads raw value, normalizes to true/false, writes normalized result so downstream blocks see post-normalization state:

# timeout: 5000
CODEMAP_RAW=$(cat ${TMPDIR:-/tmp}/dev-codemap-raw 2>/dev/null || echo auto)
CODEMAP_ENABLED=$("${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/codemap-resolve" "$CODEMAP_RAW") || {
    echo "! BLOCKED — codemap-resolve failed (likely --codemap strict but codemap unavailable); run /codemap:scan-codebase or install codemap plugin"
    exit 1
}
echo "$CODEMAP_ENABLED" > ${TMPDIR:-/tmp}/dev-codemap-enabled
# codemap: integrated-via-shared

loads: codemap-gates.md

Read $_DEV_SHARED/codemap-gates.md — follow Gate A and Gate B.

Semble preflight — if SEMBLE_ENABLED=true:

Read $_DEV_SHARED/preflight-helpers.md — execute semble preflight if flag set.

Team Mode Branch

Run immediately after flag parsing when TEAM_MODE=true. Runs Step 1 inline (teammates need scope context), then spawns parallel teammates for Steps 2-4. Exit after synthesis.

When TEAM_MODE=true:

Guard: [ -f "${HOME}/.claude/TEAM_PROTOCOL.md" ] || echo "TEAM_PROTOCOL_ABSENT" — if output contains TEAM_PROTOCOL_ABSENT: invoke AskUserQuestion — question: "foundry plugin not installed (TEAM_PROTOCOL.md absent) — cannot run team mode. Continue solo instead?" · (a) Continue solo — fall back to Steps 1–5 solo workflow · (b) Abort — stop and run /foundry:setup first. On (b): stop. On (a): set TEAM_MODE=false and continue.

Run Step 1 scope analysis inline (same analysis as solo Step 1) — teammates need orientation context. After Step 1 completes, broadcast to teammates: {feature: <desc>, scope: <modules>, API: <proposed signature>}.

Read $_DEV_SHARED/preflight-helpers.md §Team Spawn Template to get spawn prompt template. Replace [ROLE_PHRASE] with feature description, [FILE_SLUG] with feature.

Compute run directory:

# timeout: 5000
_run=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/setup_worktree.py")   # mapfile absent on macOS bash 3.2 — use portable head/tail
TS=$(echo "$_run" | head -1)
TEAM_DIR=$(echo "$_run" | tail -1)
echo "$TS" > ${TMPDIR:-/tmp}/dev-feature-team-ts
echo "$TEAM_DIR" > ${TMPDIR:-/tmp}/dev-feature-team-dir
trap 'rm -f ${TMPDIR:-/tmp}/feature-team-check-$TS' EXIT

IMPORTANT: in spawn prompts below, substitute $_SPAWN_TS and $_SPAWN_TEAM_DIR with actual computed values from bash block above — literal resolved strings, not shell variable references. Bare $TS/$TEAM_DIR inside a quoted Agent prompt string will NOT be expanded; spawned agent receives literal dollar-sign text, causing path mismatches and health-monitoring false timeouts.

# timeout: 5000
TS=$(cat ${TMPDIR:-/tmp}/dev-feature-team-ts 2>/dev/null || echo "")        # re-derive — bash state lost between Bash() calls
TEAM_DIR=$(cat ${TMPDIR:-/tmp}/dev-feature-team-dir 2>/dev/null || echo "")
_SPAWN_TS="$TS"
_SPAWN_TEAM_DIR="$TEAM_DIR"

Use $_SPAWN_TS (resolved to literal before prompt construction) inside spawn prompt strings — never bare $TS.

Spawn teammates in two serialized waves — qa-specialist and doc-scribe cannot meaningfully audit/document an implementation that does not yet exist; running them in parallel with sw-engineer produces tests written against guessed APIs and docs of placeholder structure:

Wave 1 — foundry:sw-engineer alone: spawn Teammate 1 (sw-engineer) and wait for Status: complete.
Wave 2 — foundry:qa-specialist + foundry:doc-scribe in parallel: after Wave 1 returns, spawn Teammates 2 and 3 together. Both receive actual implementation file path from Wave 1's output as input context (resolved via .temp/develop/$_SPAWN_TS/feature-sw-engineer-$_SPAWN_TS.md).

Spawn prompts: read ${CLAUDE_PLUGIN_ROOT:-plugins/develop}/skills/feature/templates/team-spawn-prompts.md for full prompt text per teammate. Summary below:

Teammate 1 — foundry:sw-engineer (model=opus): implement feature (Steps 2-3: demo test, TDD loop); edit source only, not tests/; write to .temp/develop/$_SPAWN_TS/feature-sw-engineer-$_SPAWN_TS.md; return compact JSON.
Teammate 2 — foundry:qa-specialist (model=sonnet): add edge-case/regression/security tests; edit tests/ only, not source; write to .temp/develop/$_SPAWN_TS/feature-qa-specialist-$_SPAWN_TS.md; return compact JSON.
Teammate 3 — foundry:doc-scribe (model=sonnet): prepare docstrings and README only (no CHANGELOG); write to .temp/develop/$_SPAWN_TS/feature-doc-scribe-$_SPAWN_TS.md; return compact JSON.

Path verification: after team spawns, verify agents received correct paths — check expected output files exist. Re-read $TS from temp file (bash state lost between Bash() calls — spawn block persisted it):

# timeout: 5000
TS=$(cat ${TMPDIR:-/tmp}/dev-feature-team-ts 2>/dev/null || date -u +%Y-%m-%dT%H-%M-%SZ)
for agent in sw-engineer qa-specialist doc-scribe; do
    expected=".temp/develop/$TS/feature-${agent}-$TS.md"
    [ -f "$expected" ] && echo "✓ $agent wrote $expected" || echo "⚠ $agent missing expected output $expected"
done

Wave 1 output gate — verify sw-engineer wrote expected file before launching Wave 2:

# timeout: 5000
TS=$(cat ${TMPDIR:-/tmp}/dev-feature-team-ts 2>/dev/null || echo "")
[ -n "$TS" ] || { echo "! dev-feature-team-ts missing — cannot verify Wave 1 output; aborting team mode"; exit 1; }
WAVE1_FILE=".temp/develop/$TS/feature-sw-engineer-$TS.md"
if [ ! -f "$WAVE1_FILE" ]; then
    echo "! Wave 1 output missing: $WAVE1_FILE — sw-engineer did not write expected file"
    echo "! Cannot proceed to Wave 2 without implementation. Aborting."
    exit 1
fi
echo "✓ Wave 1 output verified: $WAVE1_FILE"

Coordination order: QA challenges SW API design — lead routes challenge back to SW before implementation starts. SW shares implementation details with QA so tests stay accurate. Lead synthesizes outputs in Step 5 onward as normal.

Health monitoring (CLAUDE.md §6): re-derive $TS at block start (bash state lost between Bash() calls — read back from temp file the spawn block persisted):

# timeout: 5000
TS=$(cat ${TMPDIR:-/tmp}/dev-feature-team-ts 2>/dev/null || date -u +%Y-%m-%dT%H-%M-%SZ)

Create sentinel touch ${TMPDIR:-/tmp}/feature-team-check-$TS; every 5 min: find .temp/develop/$TS -newer ${TMPDIR:-/tmp}/feature-team-check-$TS -type f | wc -l — new files = alive; zero = stalled. Hard cutoff: 15 min no file activity → timed out. One extension (+5 min) if tail -20 of output file explains delay; second unexplained stall = hard cutoff. On timeout: read tail -100 of stalled file; surface with ⏱; never omit timed-out teammates.

After all teammates complete: read their output files from .temp/develop/$TS/, synthesize, run quality stack, produce Final Report. Exit — do not continue to solo Steps 1-5.

Step 1: Understand purpose and scope

Gather full context before writing any code:

Argument type detection: if $ARGUMENTS is positive integer (or prefixed with #, e.g. #123), treat as GitHub issue number and fetch with gh issue view. If text, treat as feature description.

Issue ID parsing rule: any argument whose leading characters are a run of digits (optionally prefixed with #, e.g. 123 or #123) is treated as a GitHub issue number — leading digits are extracted and passed to issue_fetch.py. No numeric threshold and no --issue flag gate. To avoid a numeric feature goal being misread as an issue number, phrase goal as descriptive text that does not start with digits.

_RAW="${ARGUMENTS#\#}"
ISSUE_NUM=$(echo "$_RAW" | grep -oE '^[0-9]+' | head -1)
ISSUE_NUM="${ISSUE_NUM:-$_RAW}"
if [[ "$ISSUE_NUM" =~ ^[0-9]+$ ]]; then
  REPO_NAME=$(cat ${TMPDIR:-/tmp}/dev-upstream 2>/dev/null || echo "")
  if [ -n "$REPO_NAME" ]; then
    python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/issue_fetch.py" "$ARGUMENTS" --repo "$REPO_NAME" 2>/dev/null  # timeout: 6000
  else
    python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/issue_fetch.py" "$ARGUMENTS" 2>/dev/null  # timeout: 6000
  fi
fi

If free-text description provided: use Grep tool (pattern <keyword>, glob **/*.py) to search related code. Path hint: use src/ if that directory exists, otherwise search from project root (.).

Codemap target derivation — when feature extends an existing module or modifies an existing function, pre-set TARGET_MODULE/TARGET_FN so codemap-context.md runs caller-impact queries (rdeps module importers, fn-rdeps function callers) before implementation, surfacing who breaks if existing surface changes. Goal may name extension point as module.path or module.path::function:

# timeout: 5000
if [[ "$ARGUMENTS" == *"::"* ]]; then
    _QNAME=$(printf '%s\n' "$ARGUMENTS" | grep -oE '[A-Za-z_][A-Za-z0-9_.]*::[A-Za-z_][A-Za-z0-9_]*' | head -1)
    TARGET_MODULE="${_QNAME%%::*}"
    TARGET_FN="${_QNAME##*::}"           # bare fn — codemap-context.md builds module::fn
elif [[ "$ARGUMENTS" =~ ([A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)+) ]]; then
    TARGET_MODULE="${BASH_REMATCH[1]}"     # dotted module extension
    TARGET_FN=""
else
    TARGET_MODULE=""                       # net-new — only central baseline runs
    TARGET_FN=""
fi
export TARGET_MODULE TARGET_FN
echo "$TARGET_MODULE" > ${TMPDIR:-/tmp}/dev-feature-target-module   # persist — reloaded by rdeps block (bash state lost between Bash() calls)
echo "$TARGET_FN"     > ${TMPDIR:-/tmp}/dev-feature-target-fn

Pure net-new feature (no existing module/function named) → both empty → only central baseline runs, which is correct: nothing to compute caller impact against yet.

Module-importer impact — when CODEMAP_ENABLED=true and TARGET_MODULE set, run rdeps for modules that import extension target, so implementation accounts for downstream importers before changing surface:

# timeout: 6000
CODEMAP_ENABLED=$(cat ${TMPDIR:-/tmp}/dev-codemap-enabled 2>/dev/null || echo false)
TARGET_MODULE=$(cat ${TMPDIR:-/tmp}/dev-feature-target-module 2>/dev/null || echo "")   # re-derive — bash state lost between Bash() calls
if [ "$CODEMAP_ENABLED" = "true" ] && [ -n "$TARGET_MODULE" ] && command -v scan-query >/dev/null 2>&1; then
    scan-query --timeout 5 rdeps "$TARGET_MODULE" --top 10 --exclude-tests 2>/dev/null || true
fi

If CODEMAP_ENABLED=true or SEMBLE_ENABLED=true (codemap normalized by bin/codemap-resolve; semble verified by preflight-helpers.md §Semble preflight): read $_DEV_SHARED/codemap-context.md and follow enabled sections (codemap block if CODEMAP_ENABLED, semble companion if SEMBLE_ENABLED). Skip entirely if both flags false.

Spawn foundry:sw-engineer agent to analyse codebase and produce:

Purpose: what problem does feature solve, and for which users?
Scope: which files and modules likely change (entry points, data models, tests)?
Compatibility: does feature touch public API? Require deprecation? Need backward-compat shims?
Reuse opportunities: existing utilities, base classes, patterns, abstractions new code can extend instead of duplicate
Risks: edge cases, performance implications, integration points needing careful handling
Scope challenge: Right problem? Simpler alternatives? What already exists that could extend instead of build from scratch?
Complexity smell: if proposed change touches 8+ files or introduces 2+ new classes/modules, flag explicitly — scope may need narrowing before proceeding

Complexity classification: classify as small (≤3 files, single concern), medium (4–7 files, or 1 new module), or large (8+ files, 2+ new modules, or public API change).

Read $_DEV_SHARED/plan-inline.md §Inline Plan Generation Protocol. Apply using feature context from Skill contexts table. On proceed: set PLAN_FILE=<path>; continue to Step 2. On small complexity or ACCEPT_NO_PLAN=true: skip and continue to Step 2.

Present analysis summary before proceeding.

Read $_DEV_SHARED/premise-grounding.md §Premise Grounding Gate. Apply using feature context from Skill contexts table.

Source Verification (optional — when using external APIs or version-sensitive libraries)

Skip if feature calls no external library APIs — no new framework features, no third-party SDK methods, no stdlib functions changed in recent Python version.

Trigger: feature calls external library API — new framework feature, third-party SDK method, or stdlib function changed in recent Python version.

DETECT → FETCH → CITE pipeline:

DETECT — read pyproject.toml or requirements*.txt for exact version and output:

STACK DETECTED:
- <library> <exact-version> (from pyproject.toml)
→ Fetching official docs for the relevant API.

FETCH — use WebFetch to retrieve specific relevant docs page (not homepage). Source priority: official docs > official changelog/migration guide > web standards (MDN). Never cite Stack Overflow, blog posts, or AI training data.

If WebFetch fails (network unavailable, site down): skip source verification entirely. Proceed to Step 2. Note in Final Report: "Source verification skipped — WebFetch unavailable."

CITE — when implementing, embed comment with source URL and key quoted passage:

# Docs: https://docs.example.com/v2/api/method
# "The recommended pattern for X is Y" (v2.1 docs)

Conflict — if docs describe pattern conflicting with how codebase currently uses library:

CONFLICT DETECTED:
Existing code uses <old pattern>.
<library> <version> docs recommend <new pattern> for this use case.
Options:
A) Use the documented pattern (may require updating existing call sites)
B) Match existing code (works but not idiomatic for this version)
→ Which approach?

Challenger gate

Decision — three states (default is NOT "skip": it runs on substantial features and auto-skips only small ones):

--no-challenge (CHALLENGE_ENABLED=false) → skip gate entirely, any size.
else --challenge (CHALLENGE_FORCED=$(cat ${TMPDIR:-/tmp}/dev-challenge-forced 2>/dev/null || echo false) = true) → always run, even on a small feature.
else default → run when feature is substantial (multi-file, ≳50 lines, or adds any new public API — common case for a feature); auto-skip when small (single file, ≲50 lines, no new public API).

Both flags exist because they cover opposite regimes: --no-challenge suppresses gate on substantial features where it would otherwise fire; --challenge forces it on small features where it would otherwise auto-skip.

Spawn foundry:challenger with scope analysis from Step 1 (purpose, scope, risks, approach):

"Review implementation approach and scope identified in Step 1. Challenge across all 5 dimensions: Assumptions, Missing Cases, Security Risks, Architectural Concerns, Complexity Creep. Apply mandatory refutation step."

Parse result:

Blockers found → STOP. Present findings. Don't proceed to Step 2 until user resolves each blocker or explicitly accepts risk.
Concerns only → surface as advisory section before demo test; continue.
No findings / all refuted → proceed.

# Compaction contract — boundary 1: after scope analysis, before demo/edit (compaction-contract.md §Lifecycle)
_DEV_DIR=$(cat "${TMPDIR:-/tmp}/dev-feature-dev-dir" 2>/dev/null || echo "")
_PLAN_FILE=$(cat "${TMPDIR:-/tmp}/dev-plan-file" 2>/dev/null || echo "")
_KEEP=$(cat "${TMPDIR:-/tmp}/dev-feature-keep-items" 2>/dev/null || echo "")
_PYTEST_CMD=$(cat "${TMPDIR:-/tmp}/dev-pytest-cmd" 2>/dev/null || echo "")
_PRESERVE="dev-dir=$_DEV_DIR, plan-file=${_PLAN_FILE:-none}, pytest-cmd=$_PYTEST_CMD"
[ -n "$_KEEP" ] && _PRESERVE="$_PRESERVE; user-keep: $_KEEP"
mkdir -p .claude/state  # timeout: 5000
{
    echo "## Active Skill Contract"
    echo "- skill: develop:feature · phase: demo+edit (after scope analysis and plan)"
    echo "- run-dir: $_DEV_DIR"
    echo "- preserve: $_PRESERVE"
    echo "- next: write demo test (Step 2) → TDD loop (Step 3) → review (Step 4)"
} > .claude/state/skill-contract.md

Step 2: Write a demo use-case

Before crystallising API, surface non-obvious design decisions:

ASSUMPTIONS I'M MAKING:

[assumption about API shape, e.g. "returning a list not a generator"]

[assumption about caller context, e.g. "called once per batch, not per item"] → Correct me now or I'll proceed with these.

Don't proceed to demo if any assumption would materially change API shape.

Crystallise intended API contract before any implementation. Choose form based on scope:

Choosing demo form: use inline doctest for simple functions/methods with minimal setup; use example script for features requiring external state, multiple steps, or side effects.

Unit function / simple API -> inline doctest (doctest in method docstring; must fail against current code).

Complex feature (setup required, side effects, multi-step flow) -> minimal example script examples/demo_<feature>.py; shows intended API end-to-end; becomes formal pytest test once implementation complete and API stable (end of Step 3).

Both forms must:

Use exact API feature will expose (function name, signature, return type)
Show happy-path end-to-end flow user would first reach for
Fail or error against current code (feature doesn't exist yet)

Gate: demo must fail or error.

<module> is a substitution token — resolve actual module file path (e.g. src/mypackage/feature.py) into shell variable $MODULE_PATH before executing these blocks. Do NOT execute with literal <module>.py string — bash would interpret < as stdin redirect from a file named module>.py.

# Resolve MODULE_PATH before this block — e.g.:
# MODULE_PATH=$(find src/ -name '*.py' | head -1)
# timeout: 30000
$PYTEST_CMD --collect-only --doctest-modules $MODULE_PATH -q 2>&1 | tail -5; COLLECT_EXIT=${PIPESTATUS[0]}
if [ "$COLLECT_EXIT" -eq 5 ]; then
    echo "⚠ GATE FAIL: no demo tests collected — demo file missing or doctest malformed"
    GATE_EXIT=1
elif [ "$COLLECT_EXIT" -ne 0 ]; then
    echo "⚠ Cannot collect doctests — check module for import errors (collect exit $COLLECT_EXIT)"
    GATE_EXIT=1
fi
echo "${GATE_EXIT:-0}" > ${TMPDIR:-/tmp}/dev-feature-gate-exit
echo "$COLLECT_EXIT"   > ${TMPDIR:-/tmp}/dev-feature-collect-exit

# timeout: 600000
COLLECT_EXIT=$(cat ${TMPDIR:-/tmp}/dev-feature-collect-exit 2>/dev/null || echo 1)
GATE_EXIT=$(cat ${TMPDIR:-/tmp}/dev-feature-gate-exit 2>/dev/null || echo 1)
# doctest form — MODULE_PATH resolved above
if [ "${COLLECT_EXIT:-1}" -eq 0 ]; then
    $PYTEST_CMD --doctest-modules $MODULE_PATH -v 2>&1 | tail -10; GATE_EXIT=${PIPESTATUS[0]}
    if [ "${GATE_EXIT:-0}" -eq 0 ]; then
        echo "⚠ GATE FAIL: demo passed (exit 0) — feature may already exist; revisit Step 1"
    else
        echo "✓ GATE OK: demo failed as expected (exit $GATE_EXIT)"
    fi
    echo "$GATE_EXIT" > ${TMPDIR:-/tmp}/dev-feature-gate-exit
fi

# python examples/demo_<feature>.py 2>&1 | tail -5; GATE_EXIT=$?
# echo "$GATE_EXIT" > ${TMPDIR:-/tmp}/dev-feature-gate-exit

If COLLECT_EXIT -ne 0: stop — collection failed, gate skipped (GATE_EXIT=1). If GATE_EXIT -eq 0: invoke AskUserQuestion — do not silently proceed past a gate failure with prose alone: "Demo passed against current code — feature may already exist. How to proceed?" · (a) Stop — revisit Step 1 scope (recommended; feature likely already implemented) · (b) Continue anyway — proceed with TDD loop (gate explicitly overridden). On Stop: exit; do not advance to Step 3.

Review: Validate the demo

Before proceeding to implementation, critically evaluate demo:

Goal alignment: does demo address user's stated goal, or slightly different problem?
API design: is proposed API minimal? Follows existing codebase conventions (naming, parameter order, return types)?
Missing scenarios: obvious happy-path variants or important failure modes demo doesn't cover?
Testability: can demo be automatically verified — not just print-and-inspect?

If issue found: revise demo and re-run gate. Don't proceed to Step 3 with flawed API contract — entire TDD loop anchored to this.

Step 3: TDD implementation loop

TDD test ownership: lead (or foundry:sw-engineer if delegated) writes all red-green demo and TDD tests in Steps 2–3. foundry:qa-specialist must NOT write primary demo or red-green tests in any mode — qa-specialist adds edge-case, boundary, and regression tests after implementation complete (Step 4). Rule applies in both solo and team mode.

Drive implementation by making tests pass, one cycle at a time:

python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/run_pytest_short.py" "$PYTEST_CMD" <target_test_dir>  # timeout: 600000
GATE_EXIT=$?

Gate: all existing tests must pass before proceeding. If any fail, stop — don't add new code on broken baseline. Use /develop:fix to address pre-existing failures first, then return here.

Note on exit code 5: pytest returns exit code 5 when no tests collected. Exit code 5 acceptable here — means no pre-existing tests exist yet, valid baseline for new feature. Proceed with TDD loop. Only exit codes 1, 2, 3, 4 indicate actual test failures.

(Use Glob tool — pattern: **/test_*.py — to discover test directories if <target_test_dir> unknown; check pyproject.toml [tool.pytest.ini_options] testpaths first)

Start from Step 2 demo — already failing, becomes first target. For each piece of functionality:

Target demo or write next focused test — first iteration uses Step 2 demo directly; subsequent iterations add one new test per piece of new behaviour

Run existing suite — confirm all pass:

# timeout: 600000
$PYTEST_CMD --tb=short <target_test_dir> -v 2>&1 | tail -20
GATE_EXIT=${PIPESTATUS[0]}

Run new demo/test — confirm it fails:

# timeout: 600000
$PYTEST_CMD --doctest-modules <module>.py -v --tb=short 2>&1 | tail -10
GATE_EXIT=${PIPESTATUS[0]}
$PYTEST_CMD --tb=short <test_file>::<test_name> -v
python examples/demo_<feature>.py 2>&1 | tail -5

Implement minimal code (spawn foundry:sw-engineer agent for non-trivial logic):
- Reuse or extend existing code identified in Step 1 — prefer subclassing or composing over parallel reimplementation
- Match project's existing patterns (naming, error handling, type annotations)
Run demo/test — confirm it passes
Run affected tests (prefer targeted over full suite):

Test impact (codemap) — identify minimal test set first:
```
scan-query test-impact "<changed_module>" 2>/dev/null
```
- Non-empty pytest_cmd → run those tests first; surface not_covered caveat if present
- Empty or scan-query absent → fall back to full suite below
Full suite fallback:
```
# timeout: 600000
$PYTEST_CMD --tb=short <target_test_dir> -v
```
If regressions appear: fix before moving on — never carry forward broken suite

After each cycle, refresh compaction contract so a mid-loop compaction resumes TDD loop instead of restarting Step 2 demo:

# WHY: boundary-1 contract (Step 1) says "next: Step 2 demo"; without this a mid-Step-3 compaction restarts the demo. Redo is idempotent but wastes agent spawns + test runs. checkpoint.md already lists completed steps for resume.
_DEV_DIR=$(cat "${TMPDIR:-/tmp}/dev-feature-dev-dir" 2>/dev/null || echo "")
_PYTEST_CMD=$(cat "${TMPDIR:-/tmp}/dev-pytest-cmd" 2>/dev/null || echo "")
_PLAN_FILE=$(cat "${TMPDIR:-/tmp}/dev-plan-file" 2>/dev/null || echo "")
_KEEP=$(cat "${TMPDIR:-/tmp}/dev-feature-keep-items" 2>/dev/null || echo "")
# tracked mods AND untracked new files — new TDD test/module files are untracked until staged; git diff alone drops them
_CHANGED=$( { git diff --name-only HEAD 2>/dev/null; git ls-files --others --exclude-standard 2>/dev/null; } | sort -u | tr '\n' ' ' | sed 's/ *$//')
_PRESERVE="dev-dir=$_DEV_DIR, changed-files=$_CHANGED, pytest-cmd=$_PYTEST_CMD, plan-file=${_PLAN_FILE:-none}, checkpoint=$_DEV_DIR/checkpoint.md"
[ -n "$_KEEP" ] && _PRESERVE="$_PRESERVE; user-keep: $_KEEP"
mkdir -p .claude/state  # timeout: 5000
{
    echo "## Active Skill Contract"
    echo "- skill: develop:feature · phase: TDD loop in progress (Step 3)"
    echo "- run-dir: $_DEV_DIR"
    echo "- preserve: $_PRESERVE"
    echo "- next: re-run suite to see current green state, then continue TDD for remaining behaviour — do NOT restart the Step 2 demo. checkpoint.md lists completed steps."
} > .claude/state/skill-contract.md

Repeat until all feature tests pass and Step 2 demo passes.

If Step 2 produced example script: promote into formal pytest test now that API is stable. Delete script once test in place.

# Compaction contract — boundary 2: after TDD loop, before review stack (compaction-contract.md §Lifecycle)
_DEV_DIR=$(cat "${TMPDIR:-/tmp}/dev-feature-dev-dir" 2>/dev/null || echo "")
_PYTEST_CMD=$(cat "${TMPDIR:-/tmp}/dev-pytest-cmd" 2>/dev/null || echo "")
_CHANGED=$(git diff --name-only HEAD 2>/dev/null | tr '\n' ' ' | sed 's/ *$//')
mkdir -p .claude/state  # timeout: 5000
{
    echo "## Active Skill Contract"
    echo "- skill: develop:feature · phase: review+quality (after TDD loop complete)"
    echo "- run-dir: $_DEV_DIR"
    echo "- preserve: dev-dir=$_DEV_DIR, changed-files=$_CHANGED, pytest-cmd=$_PYTEST_CMD"
    echo "- next: review and close gaps (Step 4) → docs (Step 5) → Final Report"
} > .claude/state/skill-contract.md

Step 4: Review and close gaps

Full review of implementation. Loop — review -> fix -> re-review until only nits remain. Maximum 3 cycles.

Each cycle:

5-axis quality scan — before full criteria evaluation, assess implementation on each axis:

Correctness: matches exact API from Step 2? Edge cases and error paths covered?
Readability: can another engineer understand feature without reading issue or demo?
Architecture: fits established patterns? Abstraction level appropriate?
Security: if feature touches input handling, auth, or data storage — are those paths hardened?
Performance: N+1 patterns, unbounded collections, unnecessary computation introduced?

Use scan to prioritize which criteria below get deepest scrutiny.

Evaluate against all criteria:
- API match: implementation matches exact API from Step 2 (name, signature, return type)
- Scope discipline: only Step-1-identified files changed; no drive-by fixes or unrelated edits
- Edge cases: error paths, boundary inputs, None/empty handling exercised by tests
- Test quality: tests verify behavior (not implementation internals); parametrized where inputs vary
- Simplicity: no dead code, unnecessary abstractions, over-engineering
For every gap found: implement fix immediately — add missing tests, remove dead code, revert out-of-scope edits. Return to Step 3 for substantive implementation gap needing new TDD cycle.
Re-run full suite to confirm nothing regressed:
```
# timeout: 600000
$PYTEST_CMD --tb=short <target_test_dir> -v 2>&1 | tail -20
GATE_EXIT=${PIPESTATUS[0]}
```
Objective convergence check: if findings in this cycle identical to previous cycle (same locations, same issues), declare convergence and exit loop — further cycles won't resolve; surface to user.
If only nits remain (style, cosmetic naming, minor formatting): document in Follow-up and exit loop.
If substantive gaps remain: start next cycle (max 3 total).

After 3 cycles: if substantive issues remain, stop — surface to user before proceeding to Step 5.

When stopping with unresolved issues, use the Incomplete Report Variant from ${CLAUDE_PLUGIN_ROOT:-plugins/develop}/skills/feature/templates/report-templates.md.

Step 5: Documentation

Spawn foundry:doc-scribe agent to update docstrings and README only (doc-scribe NOT-for: CHANGELOG — route separately):

Add or update docstrings on new/modified functions and classes (Google style — Napoleon)
Update module-level docstring if feature adds significant capability
Add demo from Step 2 as doctest if not already embedded
If feature changes public API: update README.md usage examples

Spawn doc-scribe with context:

Affected files: [list from Step 1 scope analysis]
New/modified public API: [function names, signatures from Step 3]
Demo location: [Step 2 demo file path and function name]

Agent must Read each affected source file before writing docstrings — do not write placeholder content.

CHANGELOG update (separate from doc-scribe): after doc-scribe completes, spawn foundry:sw-engineer to append one-line entry to CHANGELOG.md under Unreleased section. Context: feature name and one-line description of new capability.

# timeout: 600000
$PYTEST_CMD --doctest-modules <target_module> -v 2>&1 | tail -20
GATE_EXIT=${PIPESTATUS[0]}

Read $_FOUNDRY_SHARED/quality-stack.md (if file not found → skip quality stack entirely, note "foundry quality-stack not found at installed path — stack skipped" in Final Report) and execute Branch Safety Guard, Quality Stack, Codex Pre-pass, Progressive Review Loop, and Codex Mechanical Delegation steps.

Branch Safety Guard — no test suite: if no test suite found (pytest collects 0 tests or $TEST_CMD not set), log ⚠ No test suite detected — Branch Safety Guard weakened and require explicit user confirmation before proceeding past guard.

Final Report

Read ${CLAUDE_PLUGIN_ROOT:-plugins/develop}/skills/feature/templates/report-templates.md §Standard Final Report and use as output structure.

rm -f .claude/state/skill-contract.md  # clear contract — skill complete (compaction-contract.md §Lifecycle)  # timeout: 5000

Anti-Rationalizations

Temptation	Reality
"The feature is clear — I can skip the demo and go straight to code"	Without crystallized API contract, implementation drifts. Demo = spec.
"I know this library — no need to check docs"	Training data contains deprecated patterns. One fetch prevents hours of rework.
"I'll write tests after the implementation is stable"	Tests drive design. Writing first reveals API problems before baked in.
"The existing suite still passes — the feature is good"	Existing suite doesn't cover new feature. Demo and edge-case tests do.
"Step 1 analysis is unnecessary for a small addition"	Scope analysis reveals reuse opportunities and blast radius. Small additions regularly grow.

feature

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

feature

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Agent Resolution

Project Detection

Flag parsing

Team Mode Branch

Step 1: Understand purpose and scope

Source Verification (optional — when using external APIs or version-sensitive libraries)

Challenger gate

Step 2: Write a demo use-case

Review: Validate the demo

Step 3: TDD implementation loop

Step 4: Review and close gaps

Step 5: Documentation

Final Report

Anti-Rationalizations

Similar Skills

Agent Resolution

Project Detection

Flag parsing

Team Mode Branch

Step 1: Understand purpose and scope

Source Verification (optional — when using external APIs or version-sensitive libraries)

Challenger gate

Step 2: Write a demo use-case

Review: Validate the demo

Step 3: TDD implementation loop

Step 4: Review and close gaps

Step 5: Documentation

Final Report

Anti-Rationalizations

Similar Skills