Search everything...

Skill

spec-work

Executes work from plans, task packs, or prompts: triages input complexity, scans repos for patterns and tests, builds task lists, implements features while following conventions and maintaining quality.

developer-tools

automation

npx claudepluginhub sunrain520/spec-first

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Execute work efficiently while maintaining quality and finishing features.

Supporting Assets

references/shipping-workflow.mdreferences/tracker-defer.md

SKILL.md

Similar Skills

ce:work

13.2k

Executes coding tasks from plans, specs, or prompts: triages input, scans codebase for patterns/tests, assesses complexity, implements systematically to ship complete features with quality.

compound-engineering

ce-work

Executes coding tasks from plan documents or prompts: triages input complexity, builds task lists, implements systematically following patterns, verifies with tests. Use to ship complete features efficiently.

atv-starter-kit

spec-work-beta

Executes development work from plans, tasks, or prompts with optional experimental Codex delegation for token-conserving code implementation. Beta of spec-work; manual invocation for delegation trials.

3 files

spec-first

Stats

Stars23

Forks4

Last CommitApr 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

spec-work | spec-first | ClaudePluginHub

Back to Skills

Skill

spec-work

From spec-first

developer-tools

automation

npx claudepluginhub sunrain520/spec-first

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Execute work efficiently while maintaining quality and finishing features.

Supporting Assets

references/shipping-workflow.mdreferences/tracker-defer.md

SKILL.md

Work Execution Command

Execute work efficiently while maintaining quality and finishing features.

Introduction

This command takes a work document (plan, task pack, or specification) or a bare prompt describing the work, and executes it systematically. The focus is on shipping complete features by understanding requirements quickly, following existing patterns, and maintaining quality throughout.

CRG Work Anchors

When a CRG graph is available, use spec-first crg hook before-work --plan=<plan.md> --repo=<repo> before implementation, or spec-first crg hook before-work --task-pack=<tasks.md> --repo=<repo> when executing a derived task pack. If opened at a parent workspace root, first run spec-first crg workspace context --root=<workspace> --task="<task>"; require an explicit child repo choice from the advisory candidates before editing files or running repo-local hooks. For multi-child tasks, decompose into explicit sequential repo-local work runs; do not create one hidden combined workspace work-run. After implementation, use spec-first crg hook after-work --work-run=<id> --repo=<repo> or pass an explicit base with --since=<base>. Hook output is advisory context for comparing planned surface and actual blast radius; it must not override the plan or replace LLM judgment. If graph state is unavailable, continue with targeted direct repo reads and do not read old Stage-0 docs as fallback.

Input Document

<input_document> #$ARGUMENTS </input_document>

Execution Workflow

Phase 0: Input Triage

Determine how to proceed based on what was provided in <input_document>.

Plan or task-pack document (input is a file path to an existing plan, task pack, or specification) → skip to Phase 1.

Bare prompt (input is a description of work, not a file path):

Scan the work area
- Identify files likely to change based on the prompt
- Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
- Note local patterns and conventions in the affected areas

Assess complexity and route

Complexity	Signals	Action
Trivial	1-2 files, no behavioral change (typo, config, rename)	Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code
Small / Medium	Clear scope, under ~10 files	Build a task list from discovery. Proceed to Phase 1 step 2
Large	Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations	Inform the user this would benefit from `/spec:brainstorm` or `/spec:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2

Phase 1: Quick Start

Read Plan and Clarify (skip if arriving from Phase 0 with a bare prompt)
- Read the work document completely
- Treat the plan as a decision artifact, not an execution script
- If the work document includes sections such as Implementation Units, Work Breakdown, Requirements Trace, Files, Test Scenarios, or Verification, use those as the primary source material for execution
- If the work document is a task pack, also use Task Graph, Execution Waves, Task Cards, Validation Notes, and Regeneration Rules as the primary source material for execution
- If the work document is a task pack, validate it before creating execution tasks:
  - read its frontmatter and confirm type: task-pack, generated_by: spec-write-tasks, status: derived, and mode: derived
  - read source_plan and treat that plan as the single source of truth for scope, requirements, and non-goals
  - read spec_id from the task pack and source plan. If the task pack lacks spec_id, stop as missing identity; if both are present, they must match; if they mismatch, reject the task pack as wrong-chain handoff before implementation
  - if the source plan lacks spec_id, treat task-pack identity as unverifiable weak trace and stop for executable task-pack handoff; ask to return to spec-plan to add plan frontmatter or rerun spec-write-tasks
  - confirm source_plan_hash is a concrete canonical source plan body sha256:<64-hex> hash, not pending-tooling, unknown, empty, or a draft marker
  - compare the task pack hash against the current source plan using spec-first tasks validate <task-pack-path> --json; if that tooling is unavailable, treat the task pack as unverifiable and stop
  - confirm the validator accepted the Task Pack Contract JSON block; do not infer executable task structure from free-form Markdown task cards
  - reject draft, transient, missing-source, missing-spec-id, spec-id-mismatch, missing-hash, unavailable-hash-tooling, unverifiable-hash, or hash-mismatch task packs before implementation
  - when rejecting, stop and ask to rerun spec-write-tasks from the source plan or return to spec-plan; do not silently fall back to executing stale task cards
  - during execution, honor each task's stop_if; if triggered, stop and return to spec-plan or regenerate the task pack instead of expanding scope in place
- If the work document is a plan path, and validated task-pack consumption is available, run the optional task-pack suitability check before before-work --plan, before creating a work-run, and before creating the internal task tracker:
  - offer the diversion once only when the plan has strong signals: 3+ implementation units, multiple phases, cross-module files, foundation tasks, dependency chains, parallel waves, 6+ likely core files, or verification across unit/smoke/integration layers
  - do not offer it for 1-2 file changes, docs-only/config-only/narrow bugfix plans, plans whose units are already small enough for the internal tracker, or when the user explicitly says to execute the plan directly
  - if the user chooses task compilation, pause plan execution, run spec-write-tasks <plan-path>, and re-enter only after it returns deterministic handoff with semantic_posture: generated-this-run | reviewed-existing
  - if the user chooses direct execution, continue with before-work --plan and the internal tracker, and do not prompt again in this work run
- Check for Execution note on each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.
- Check for a Deferred to Implementation or Implementation-Time Unknowns section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
- Check for a Scope Boundaries section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
- Review any references or links provided in the plan
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no Execution note
- If anything is unclear or ambiguous, ask clarifying questions now
- If clarifying questions were needed above, get user approval on the resolved answers. If no clarifications were needed, proceed without a separate approval step — plan scope is the plan's authority, not something to renegotiate
- Do not skip this - better to ask questions now than build the wrong thing
- Do not edit the plan body during execution. The plan is a decision artifact; progress lives in git commits and the task tracker. The only plan mutation during spec-work is the final status: active → completed flip at shipping (see references/shipping-workflow.md Phase 4 Step 2). Legacy plans may contain - [ ] / - [x] marks on unit headings — ignore them as state; per-unit completion is determined during execution by reading the current file state.
Setup Environment

First, check the current branch:
```
current_branch=$(git branch --show-current)
default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')

# Fallback if remote HEAD isn't set
if [ -z "$default_branch" ]; then
  default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master")
fi
```
If already on a feature branch (not the default branch):

First, check whether the branch name is meaningful — a name like feat/crowd-sniff or fix/email-validation tells future readers what the work is about. Auto-generated worktree names (e.g., worktree-jolly-beaming-raven) or other opaque names do not.

If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
```
git branch -m <meaningful-name>
```
Derive the new name from the plan title or work description (e.g., feat/crowd-sniff). Present the rename as a recommended option alongside continuing as-is.

Then ask: "Continue working on [current_branch], or create a new branch?"
- If continuing (with or without rename), proceed to step 3
- If creating new, follow Option A or B below
If on the default branch, choose how to proceed:

Option A: Create a new branch
```
git pull origin [default_branch]
git checkout -b feature-branch-name
```
Use a meaningful name based on the work (e.g., feat/user-authentication, fix/email-validation).

Option B: Use a worktree (recommended for parallel development)
```
skill: git-worktree
# The skill will create a new branch from the default branch in an isolated worktree
```
Option C: Continue on the default branch
- Requires explicit user confirmation
- Only proceed after user explicitly says "yes, commit to [default_branch]"
- Never commit directly to the default branch without explicit permission
Recommendation: Use worktree if:
- You want to work on multiple features simultaneously
- You want to keep the default branch clean while experimenting
- You plan to switch between branches frequently
Create Task List (skip if Phase 0 already built one, or if Phase 0 routed as Trivial)
- Use the platform's task tracking tool (TaskCreate/TaskUpdate/TaskList in Claude Code, update_plan in Codex, or the equivalent on other harnesses) to break the plan into actionable tasks
- If the input is a validated task pack, derive the task list from Task Cards and preserve task_id, dependencies, wave, files, test_focus, done_signal, and stop_if
- If the input is a task pack, do not create execution tasks until the task-pack validation checks above have passed
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
- When the plan defines U-IDs for Implementation Units, preserve the unit's U-ID as a prefix in the task subject (e.g., "U3: Add parser coverage"). This keeps blocker references, deferred-work notes, and final summaries anchored to the same identifier the plan uses, so progress and traceability remain unambiguous across plan edits
- When the work document has spec_id, keep it as trace context for blockers, deferred-work notes, task summaries, and final verification when it helps distinguish related requirements/plan/task-pack artifacts. Do not treat it as execution state or completion status
- Carry each unit's Execution note into the task when present
- For each unit, read the Patterns to follow field before implementing — these point to specific files or conventions to mirror
- Use each unit's Verification field as the primary "done" signal for that task
- Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
- Include dependencies between tasks
- Prioritize based on what needs to be done first
- Include testing and quality check tasks
- Keep tasks specific and completable

Choose Execution Strategy

After creating the task list, decide how to execute based on the plan's size and dependency structure:

Strategy	When to use
Inline	1-2 small tasks, or tasks needing user interaction mid-flight. Default for bare-prompt work — bare prompts rarely produce enough structured context to justify subagent dispatch
Serial subagents	3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios)
Parallel subagents	3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata

Parallel Safety Check — required before choosing parallel dispatch:

Build a file-to-unit mapping from every candidate unit's Files: section (Create, Modify, and Test paths)
Check for intersection — any file path appearing in 2+ units means overlap
Use the host capability matrix below before deciding whether overlap is allowed. If reliable isolation is unavailable, downgrade overlapping units to serial subagents and log the reason (e.g., "Units 2 and 4 share config/routes.rb — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory write races.

Even with no file overlap, parallel subagents sharing the orchestrator's working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). Reliable isolation eliminates both; the shared-directory fallback constraints below mitigate them.

Host capability matrix

Host path	Isolation model	Parallel overlap rule	Commit/test ownership
Claude Code `Agent` with worktree isolation	Pass `isolation: "worktree"` and `run_in_background: true`; the harness creates a per-subagent worktree under `.claude/worktrees/agent-<id>` on its own branch. Verify `.claude/worktrees/` is gitignored before relying on this.	Overlap is allowed only as a predicted merge conflict handled by the worktree-isolated post-batch flow. Log the predicted overlap before dispatch.	Subagents may stage, commit, and run their unit tests inside their own worktree branch.
Claude Code `Agent` without worktree isolation, or any shared-directory subagent	Subagents write in the orchestrator's working directory.	Overlap is not safe. Downgrade overlapping units to serial.	Subagents must not stage, commit, or run the project test suite.
Codex `spawn_agent` / forked workspace	Use Codex's fork workspace semantics when available. Do not pass or claim Claude's `isolation: "worktree"` parameter.	Prefer disjoint write sets. If files overlap, dispatch serially unless the harness provides an explicit diff/merge handoff you can inspect before integration.	The orchestrator owns final integration, staging, commits, and project-level verification.
No subagent support	Inline execution only.	Not applicable.	The current agent owns all work.

Subagent dispatch uses your available subagent or task spawning mechanism. For each unit, give the subagent:

The full plan file path (for overall context)
The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
Any resolved deferred questions relevant to that unit
Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests

Shared-directory fallback constraints — apply when reliable isolation is unavailable:

Instruct each subagent: "Do not stage files (git add), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
These constraints prevent git index contention and test interference between concurrent subagents.
With Claude Code worktree isolation active, omit these constraints — subagents may stage, commit, and run their unit tests within their own worktree branch.

Permission mode: Omit the mode parameter when dispatching subagents so the user's configured permission settings apply. Do not pass mode: "auto" — it overrides user-level settings like bypassPermissions.

After each subagent completes (serial mode):

Review the subagent's diff — verify changes match the unit's scope and Files: list
Run the relevant test suite to confirm the tree is healthy
If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
Update the task list (do not edit the plan body — progress is carried by the commit)
Dispatch the next unit

After all parallel subagents in a batch complete (worktree-isolated mode):

Wait for every subagent in the current parallel batch to finish.
For each completed subagent, in dependency order: review the worktree's diff against the orchestrator's branch. If the subagent did not commit its own work, stage and commit it inside that worktree.
Merge each subagent's branch into the orchestrator's branch sequentially in dependency order. If a merge conflict surfaces, abort the merge (git merge --abort) and re-dispatch the conflicting unit serially against the now-merged tree — hand-resolving silently picks a side and discards one unit's intent. Predicted overlap from the Parallel Safety Check surfaces here as a conflict, not as silent data loss in shared-directory mode.
After each merge, run the relevant test suite. If tests fail, diagnose and fix before merging the next branch.
Update the task list (progress is carried by the merge commits).
After merging, remove each subagent's worktree and delete its branch. Use the absolute path and branch name returned in the subagent's result.
- Unlock the worktree first when the harness locks per-subagent worktrees: git worktree unlock <absolute-path>
- Remove the worktree: git worktree remove <absolute-path>
- Delete the branch: git branch -d <branch-name> (-d refuses to delete unmerged branches; if it fails, investigate before forcing)
Dispatch the next batch of independent units, or the next dependent unit.

After all parallel subagents in a batch complete (shared-directory or fork-workspace handoff):

Wait for every subagent in the current parallel batch to finish before acting on any of their results
Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared Files: lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe what not how. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
Update the task list (do not edit the plan body — progress is carried by the commits just made)
Dispatch the next batch of independent units, or the next dependent unit

Phase 2: Execute

Task Execution Loop

For each task in priority order:

while (tasks remain):
  - Mark task as in-progress
  - Read any referenced files from the plan or discovered during Phase 0
  - **If the unit's work is already present and matches the plan's intent** (files exist with the expected capability, or the unit's `Verification` criteria are already satisfied by the current code), the work has likely shipped on a prior branch or session. Verify it matches, mark the task complete, and move on. Do not silently reimplement.
  - Look for similar patterns in codebase
  - Find existing test files for implementation files being changed (Test Discovery — see below)
  - Implement following existing conventions
  - Add, update, or remove tests to match implementation changes (see Test Discovery below)
  - Run System-Wide Test Check (see below)
  - Run tests after changes
  - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
  - Mark task as completed
  - Evaluate for incremental commit (see below)

When a unit carries an Execution note, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without an Execution note, proceed pragmatically.

Guardrails for execution posture:

Do not write the test and implementation in the same step when working test-first
Do not skip verifying that a new test fails before implementing the fix or feature
Do not over-implement beyond the current behavior slice when working test-first
Skip test-first discipline for trivial renames, pure configuration, and pure styling work

Test Discovery — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.

Test Scenario Completeness — Before writing tests for a feature-bearing unit, check whether the plan's Test scenarios cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:

Category	When it applies	How to derive if missing
Happy path	Always for feature-bearing units	Read the unit's Goal and Approach for core input/output pairs
Edge cases	When the unit has meaningful boundaries (inputs, state, concurrency)	Identify boundary values, empty/nil inputs, and concurrent access patterns
Error/failure paths	When the unit has failure modes (validation, external calls, permissions)	Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle
Integration	When the unit crosses layers (callbacks, middleware, multi-service)	Identify the cross-layer chain and write a scenario that exercises it without mocks

System-Wide Test Check — Before marking a task done, pause and ask:

Question	What to do
What fires when this runs? Callbacks, middleware, observers, event handlers — trace two levels out from your change.	Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, `after_*` hooks.
Do my tests exercise the real chain? If every dependency is mocked, the test proves your logic works in isolation — it says nothing about the interaction.	Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact.
Can failure leave orphaned state? If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates?	Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent.
What other interfaces expose this? Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods).	Grep for the method/behavior in related classes. If parity is needed, add it now — not as a follow-up.
Do error strategies align across layers? Retry middleware + application fallback + framework error handling — do they conflict or create double execution?	List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises.

When to skip: Leaf-node changes with no callbacks, no state persistence, no parallel interfaces. If the change is purely additive (new helper method, new view partial), the check takes 10 seconds and the answer is "nothing fires, skip."

When this matters most: Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.

Incremental Commits

After completing each task, evaluate whether to create an incremental commit:

Commit when...	Don't commit when...
Logical unit complete (model, service, component)	Small part of a larger unit
Tests pass + meaningful progress	Tests failing
About to switch contexts (backend → frontend)	Purely scaffolding with no behavior
About to attempt risky/uncertain changes	Would need a "WIP" commit message

Heuristic: "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."

If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.

Commit workflow:

# 1. Verify tests pass (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.

# 2. Stage only files related to this logical unit (not `git add .`)
git add <files related to this logical unit>

# 3. Commit with conventional message
git commit -m "feat(scope): description of this unit"

Handling merge conflicts: If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.

Note: Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.

Parallel subagent mode: Commit ownership is split by isolation mode (see Phase 1 Step 4):

Claude Code worktree-isolated: subagents may stage and commit inside their own worktree branch; the orchestrator merges those branches in dependency order after the batch.
Shared-directory fallback or Codex fork-workspace handoff: subagents do not create final repo commits; the orchestrator stages, verifies, and commits each integrated unit after the batch.

Follow Existing Patterns
- The plan should reference similar code - read those files first
- Match naming conventions exactly
- Reuse existing components where possible
- Follow project coding standards (see AGENTS.md; use CLAUDE.md only if the repo still keeps a compatibility shim)
- When in doubt, grep for similar implementations
Test Continuously
- Run relevant tests after each significant change
- Don't wait until the end to test
- Fix failures immediately
- Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
- Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together. If your change touches callbacks, middleware, or error handling — you need both.
Simplify as You Go

After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.

Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.

If a simplify skill or equivalent capability is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
Figma Design Sync (if applicable)

For UI work with Figma designs:
- Implement components following design specs
- Use spec-figma-design-sync agent iteratively to compare
- Fix visual differences identified
- Repeat until implementation matches design
Track Progress
- Keep the task list updated as you complete tasks
- Note any blockers or unexpected discoveries
- Create new tasks if scope expands
- Keep user informed of major milestones
- When the plan defines U-IDs for Implementation Units, or the plan or origin document carries stable R-IDs (and optionally A/F/AE IDs), reference them in blockers, deferred-work notes, task summaries, and final verification — not routine status updates. U-IDs anchor units across plan edits; R/A/F/AE anchor product intent across the brainstorm-plan handoff. When available, include spec_id only as artifact-chain trace context, not as execution progress. Use the IDs the plan supplies and do not invent ones it does not. This preserves traceability without burying signal under noise.

Phase 3-4: Quality Check and Ship It

When all Phase 2 tasks are complete and execution transitions to quality check, read references/shipping-workflow.md for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.

Key Principles

Start Fast, Execute Faster

Get clarification once at the start, then execute
Don't wait for perfect understanding - ask questions and move
The goal is to finish the feature, not create perfect process

The Plan is Your Guide

Work documents should reference similar code and patterns
Load those references and follow them
Don't reinvent - match what exists

Test As You Go

Run tests after each change, not at the end
Fix failures immediately
Continuous testing prevents big surprises

Quality is Built In

Follow existing patterns
Write tests for new code
Run linting before pushing
Review every change — inline for simple additive work, full review for everything else

Ship Complete Features

Mark all tasks completed before moving on
Don't leave features 80% done
A finished feature that ships beats a perfect feature that doesn't

Common Pitfalls to Avoid

Analysis paralysis - Don't overthink, read the plan and execute
Skipping clarifying questions - Ask now, not after building wrong thing
Ignoring plan references - The plan has links for a reason
Testing at the end - Test continuously or suffer later
Forgetting to track progress - Update task status as you go or lose track of what's done
80% done syndrome - Finish the feature, don't move on early
Skipping review - Every change gets reviewed; only the depth varies
Re-scoping the plan into human-time phases - The plan's Implementation Units define the scope of execution. Do not estimate human-hours per unit, propose multi-day breakdowns, or ask the user to pick a subset of units for "this session". Agents execute at agent speed, and context-window pressure is addressed by subagent dispatch (Phase 1 Step 4), not by phased sessions. If a plan-file input is genuinely too large for a single execution, say so plainly and suggest the user return to /spec:plan to reduce scope — don't invent session phases as a workaround. For bare-prompt input, Phase 0's Large routing already handles oversized work

Similar Skills

ce:work

13.2k

Executes coding tasks from plans, specs, or prompts: triages input, scans codebase for patterns/tests, assesses complexity, implements systematically to ship complete features with quality.

compound-engineering

ce-work

atv-starter-kit

spec-work-beta

3 files

spec-first

Stats

Stars23

Forks4

Last CommitApr 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Work Execution Command

Execute work efficiently while maintaining quality and finishing features.

Introduction

CRG Work Anchors

Input Document

<input_document> #$ARGUMENTS </input_document>

Execution Workflow

Phase 0: Input Triage

Determine how to proceed based on what was provided in <input_document>.

Plan or task-pack document (input is a file path to an existing plan, task pack, or specification) → skip to Phase 1.

Bare prompt (input is a description of work, not a file path):

Scan the work area
- Identify files likely to change based on the prompt
- Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
- Note local patterns and conventions in the affected areas

Assess complexity and route

Complexity	Signals	Action
Trivial	1-2 files, no behavioral change (typo, config, rename)	Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code
Small / Medium	Clear scope, under ~10 files	Build a task list from discovery. Proceed to Phase 1 step 2
Large	Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations	Inform the user this would benefit from `/spec:brainstorm` or `/spec:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2

Phase 1: Quick Start

Read Plan and Clarify (skip if arriving from Phase 0 with a bare prompt)
- Read the work document completely
- Treat the plan as a decision artifact, not an execution script
- If the work document includes sections such as Implementation Units, Work Breakdown, Requirements Trace, Files, Test Scenarios, or Verification, use those as the primary source material for execution
- If the work document is a task pack, also use Task Graph, Execution Waves, Task Cards, Validation Notes, and Regeneration Rules as the primary source material for execution
- If the work document is a task pack, validate it before creating execution tasks:
  - read its frontmatter and confirm type: task-pack, generated_by: spec-write-tasks, status: derived, and mode: derived
  - read source_plan and treat that plan as the single source of truth for scope, requirements, and non-goals
  - read spec_id from the task pack and source plan. If the task pack lacks spec_id, stop as missing identity; if both are present, they must match; if they mismatch, reject the task pack as wrong-chain handoff before implementation
  - if the source plan lacks spec_id, treat task-pack identity as unverifiable weak trace and stop for executable task-pack handoff; ask to return to spec-plan to add plan frontmatter or rerun spec-write-tasks
  - confirm source_plan_hash is a concrete canonical source plan body sha256:<64-hex> hash, not pending-tooling, unknown, empty, or a draft marker
  - compare the task pack hash against the current source plan using spec-first tasks validate <task-pack-path> --json; if that tooling is unavailable, treat the task pack as unverifiable and stop
  - confirm the validator accepted the Task Pack Contract JSON block; do not infer executable task structure from free-form Markdown task cards
  - reject draft, transient, missing-source, missing-spec-id, spec-id-mismatch, missing-hash, unavailable-hash-tooling, unverifiable-hash, or hash-mismatch task packs before implementation
  - when rejecting, stop and ask to rerun spec-write-tasks from the source plan or return to spec-plan; do not silently fall back to executing stale task cards
  - during execution, honor each task's stop_if; if triggered, stop and return to spec-plan or regenerate the task pack instead of expanding scope in place
- If the work document is a plan path, and validated task-pack consumption is available, run the optional task-pack suitability check before before-work --plan, before creating a work-run, and before creating the internal task tracker:
  - offer the diversion once only when the plan has strong signals: 3+ implementation units, multiple phases, cross-module files, foundation tasks, dependency chains, parallel waves, 6+ likely core files, or verification across unit/smoke/integration layers
  - do not offer it for 1-2 file changes, docs-only/config-only/narrow bugfix plans, plans whose units are already small enough for the internal tracker, or when the user explicitly says to execute the plan directly
  - if the user chooses task compilation, pause plan execution, run spec-write-tasks <plan-path>, and re-enter only after it returns deterministic handoff with semantic_posture: generated-this-run | reviewed-existing
  - if the user chooses direct execution, continue with before-work --plan and the internal tracker, and do not prompt again in this work run
- Check for Execution note on each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.
- Check for a Deferred to Implementation or Implementation-Time Unknowns section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
- Check for a Scope Boundaries section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
- Review any references or links provided in the plan
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no Execution note
- If anything is unclear or ambiguous, ask clarifying questions now
- If clarifying questions were needed above, get user approval on the resolved answers. If no clarifications were needed, proceed without a separate approval step — plan scope is the plan's authority, not something to renegotiate
- Do not skip this - better to ask questions now than build the wrong thing
- Do not edit the plan body during execution. The plan is a decision artifact; progress lives in git commits and the task tracker. The only plan mutation during spec-work is the final status: active → completed flip at shipping (see references/shipping-workflow.md Phase 4 Step 2). Legacy plans may contain - [ ] / - [x] marks on unit headings — ignore them as state; per-unit completion is determined during execution by reading the current file state.
Setup Environment

First, check the current branch:
```
current_branch=$(git branch --show-current)
default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')

# Fallback if remote HEAD isn't set
if [ -z "$default_branch" ]; then
  default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master")
fi
```
If already on a feature branch (not the default branch):

First, check whether the branch name is meaningful — a name like feat/crowd-sniff or fix/email-validation tells future readers what the work is about. Auto-generated worktree names (e.g., worktree-jolly-beaming-raven) or other opaque names do not.

If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
```
git branch -m <meaningful-name>
```
Derive the new name from the plan title or work description (e.g., feat/crowd-sniff). Present the rename as a recommended option alongside continuing as-is.

Then ask: "Continue working on [current_branch], or create a new branch?"
- If continuing (with or without rename), proceed to step 3
- If creating new, follow Option A or B below
If on the default branch, choose how to proceed:

Option A: Create a new branch
```
git pull origin [default_branch]
git checkout -b feature-branch-name
```
Use a meaningful name based on the work (e.g., feat/user-authentication, fix/email-validation).

Option B: Use a worktree (recommended for parallel development)
```
skill: git-worktree
# The skill will create a new branch from the default branch in an isolated worktree
```
Option C: Continue on the default branch
- Requires explicit user confirmation
- Only proceed after user explicitly says "yes, commit to [default_branch]"
- Never commit directly to the default branch without explicit permission
Recommendation: Use worktree if:
- You want to work on multiple features simultaneously
- You want to keep the default branch clean while experimenting
- You plan to switch between branches frequently
Create Task List (skip if Phase 0 already built one, or if Phase 0 routed as Trivial)
- Use the platform's task tracking tool (TaskCreate/TaskUpdate/TaskList in Claude Code, update_plan in Codex, or the equivalent on other harnesses) to break the plan into actionable tasks
- If the input is a validated task pack, derive the task list from Task Cards and preserve task_id, dependencies, wave, files, test_focus, done_signal, and stop_if
- If the input is a task pack, do not create execution tasks until the task-pack validation checks above have passed
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
- When the plan defines U-IDs for Implementation Units, preserve the unit's U-ID as a prefix in the task subject (e.g., "U3: Add parser coverage"). This keeps blocker references, deferred-work notes, and final summaries anchored to the same identifier the plan uses, so progress and traceability remain unambiguous across plan edits
- When the work document has spec_id, keep it as trace context for blockers, deferred-work notes, task summaries, and final verification when it helps distinguish related requirements/plan/task-pack artifacts. Do not treat it as execution state or completion status
- Carry each unit's Execution note into the task when present
- For each unit, read the Patterns to follow field before implementing — these point to specific files or conventions to mirror
- Use each unit's Verification field as the primary "done" signal for that task
- Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
- Include dependencies between tasks
- Prioritize based on what needs to be done first
- Include testing and quality check tasks
- Keep tasks specific and completable

Choose Execution Strategy

After creating the task list, decide how to execute based on the plan's size and dependency structure:

Strategy	When to use
Inline	1-2 small tasks, or tasks needing user interaction mid-flight. Default for bare-prompt work — bare prompts rarely produce enough structured context to justify subagent dispatch
Serial subagents	3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios)
Parallel subagents	3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata

Parallel Safety Check — required before choosing parallel dispatch:

Build a file-to-unit mapping from every candidate unit's Files: section (Create, Modify, and Test paths)
Check for intersection — any file path appearing in 2+ units means overlap
Use the host capability matrix below before deciding whether overlap is allowed. If reliable isolation is unavailable, downgrade overlapping units to serial subagents and log the reason (e.g., "Units 2 and 4 share config/routes.rb — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory write races.

Host capability matrix

Host path	Isolation model	Parallel overlap rule	Commit/test ownership
Claude Code `Agent` with worktree isolation	Pass `isolation: "worktree"` and `run_in_background: true`; the harness creates a per-subagent worktree under `.claude/worktrees/agent-<id>` on its own branch. Verify `.claude/worktrees/` is gitignored before relying on this.	Overlap is allowed only as a predicted merge conflict handled by the worktree-isolated post-batch flow. Log the predicted overlap before dispatch.	Subagents may stage, commit, and run their unit tests inside their own worktree branch.
Claude Code `Agent` without worktree isolation, or any shared-directory subagent	Subagents write in the orchestrator's working directory.	Overlap is not safe. Downgrade overlapping units to serial.	Subagents must not stage, commit, or run the project test suite.
Codex `spawn_agent` / forked workspace	Use Codex's fork workspace semantics when available. Do not pass or claim Claude's `isolation: "worktree"` parameter.	Prefer disjoint write sets. If files overlap, dispatch serially unless the harness provides an explicit diff/merge handoff you can inspect before integration.	The orchestrator owns final integration, staging, commits, and project-level verification.
No subagent support	Inline execution only.	Not applicable.	The current agent owns all work.

Subagent dispatch uses your available subagent or task spawning mechanism. For each unit, give the subagent:

The full plan file path (for overall context)
The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
Any resolved deferred questions relevant to that unit
Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests

Shared-directory fallback constraints — apply when reliable isolation is unavailable:

Instruct each subagent: "Do not stage files (git add), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
These constraints prevent git index contention and test interference between concurrent subagents.
With Claude Code worktree isolation active, omit these constraints — subagents may stage, commit, and run their unit tests within their own worktree branch.

After each subagent completes (serial mode):

Review the subagent's diff — verify changes match the unit's scope and Files: list
Run the relevant test suite to confirm the tree is healthy
If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
Update the task list (do not edit the plan body — progress is carried by the commit)
Dispatch the next unit

After all parallel subagents in a batch complete (worktree-isolated mode):

Wait for every subagent in the current parallel batch to finish.
For each completed subagent, in dependency order: review the worktree's diff against the orchestrator's branch. If the subagent did not commit its own work, stage and commit it inside that worktree.
Merge each subagent's branch into the orchestrator's branch sequentially in dependency order. If a merge conflict surfaces, abort the merge (git merge --abort) and re-dispatch the conflicting unit serially against the now-merged tree — hand-resolving silently picks a side and discards one unit's intent. Predicted overlap from the Parallel Safety Check surfaces here as a conflict, not as silent data loss in shared-directory mode.
After each merge, run the relevant test suite. If tests fail, diagnose and fix before merging the next branch.
Update the task list (progress is carried by the merge commits).
After merging, remove each subagent's worktree and delete its branch. Use the absolute path and branch name returned in the subagent's result.
- Unlock the worktree first when the harness locks per-subagent worktrees: git worktree unlock <absolute-path>
- Remove the worktree: git worktree remove <absolute-path>
- Delete the branch: git branch -d <branch-name> (-d refuses to delete unmerged branches; if it fails, investigate before forcing)
Dispatch the next batch of independent units, or the next dependent unit.

After all parallel subagents in a batch complete (shared-directory or fork-workspace handoff):

Wait for every subagent in the current parallel batch to finish before acting on any of their results
Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared Files: lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe what not how. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
Update the task list (do not edit the plan body — progress is carried by the commits just made)
Dispatch the next batch of independent units, or the next dependent unit

Phase 2: Execute

Task Execution Loop

For each task in priority order:

while (tasks remain):
  - Mark task as in-progress
  - Read any referenced files from the plan or discovered during Phase 0
  - **If the unit's work is already present and matches the plan's intent** (files exist with the expected capability, or the unit's `Verification` criteria are already satisfied by the current code), the work has likely shipped on a prior branch or session. Verify it matches, mark the task complete, and move on. Do not silently reimplement.
  - Look for similar patterns in codebase
  - Find existing test files for implementation files being changed (Test Discovery — see below)
  - Implement following existing conventions
  - Add, update, or remove tests to match implementation changes (see Test Discovery below)
  - Run System-Wide Test Check (see below)
  - Run tests after changes
  - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
  - Mark task as completed
  - Evaluate for incremental commit (see below)

Guardrails for execution posture:

Do not write the test and implementation in the same step when working test-first
Do not skip verifying that a new test fails before implementing the fix or feature
Do not over-implement beyond the current behavior slice when working test-first
Skip test-first discipline for trivial renames, pure configuration, and pure styling work

Category	When it applies	How to derive if missing
Happy path	Always for feature-bearing units	Read the unit's Goal and Approach for core input/output pairs
Edge cases	When the unit has meaningful boundaries (inputs, state, concurrency)	Identify boundary values, empty/nil inputs, and concurrent access patterns
Error/failure paths	When the unit has failure modes (validation, external calls, permissions)	Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle
Integration	When the unit crosses layers (callbacks, middleware, multi-service)	Identify the cross-layer chain and write a scenario that exercises it without mocks

System-Wide Test Check — Before marking a task done, pause and ask:

Question	What to do
What fires when this runs? Callbacks, middleware, observers, event handlers — trace two levels out from your change.	Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, `after_*` hooks.
Do my tests exercise the real chain? If every dependency is mocked, the test proves your logic works in isolation — it says nothing about the interaction.	Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact.
Can failure leave orphaned state? If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates?	Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent.
What other interfaces expose this? Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods).	Grep for the method/behavior in related classes. If parity is needed, add it now — not as a follow-up.
Do error strategies align across layers? Retry middleware + application fallback + framework error handling — do they conflict or create double execution?	List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises.

When this matters most: Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.

Incremental Commits

After completing each task, evaluate whether to create an incremental commit:

Commit when...	Don't commit when...
Logical unit complete (model, service, component)	Small part of a larger unit
Tests pass + meaningful progress	Tests failing
About to switch contexts (backend → frontend)	Purely scaffolding with no behavior
About to attempt risky/uncertain changes	Would need a "WIP" commit message

Heuristic: "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."

Commit workflow:

# 1. Verify tests pass (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.

# 2. Stage only files related to this logical unit (not `git add .`)
git add <files related to this logical unit>

# 3. Commit with conventional message
git commit -m "feat(scope): description of this unit"

Handling merge conflicts: If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.

Note: Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.

Parallel subagent mode: Commit ownership is split by isolation mode (see Phase 1 Step 4):

Claude Code worktree-isolated: subagents may stage and commit inside their own worktree branch; the orchestrator merges those branches in dependency order after the batch.
Shared-directory fallback or Codex fork-workspace handoff: subagents do not create final repo commits; the orchestrator stages, verifies, and commits each integrated unit after the batch.

Follow Existing Patterns
- The plan should reference similar code - read those files first
- Match naming conventions exactly
- Reuse existing components where possible
- Follow project coding standards (see AGENTS.md; use CLAUDE.md only if the repo still keeps a compatibility shim)
- When in doubt, grep for similar implementations
Test Continuously
- Run relevant tests after each significant change
- Don't wait until the end to test
- Fix failures immediately
- Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
- Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together. If your change touches callbacks, middleware, or error handling — you need both.
Simplify as You Go

After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.

Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.

If a simplify skill or equivalent capability is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
Figma Design Sync (if applicable)

For UI work with Figma designs:
- Implement components following design specs
- Use spec-figma-design-sync agent iteratively to compare
- Fix visual differences identified
- Repeat until implementation matches design
Track Progress
- Keep the task list updated as you complete tasks
- Note any blockers or unexpected discoveries
- Create new tasks if scope expands
- Keep user informed of major milestones
- When the plan defines U-IDs for Implementation Units, or the plan or origin document carries stable R-IDs (and optionally A/F/AE IDs), reference them in blockers, deferred-work notes, task summaries, and final verification — not routine status updates. U-IDs anchor units across plan edits; R/A/F/AE anchor product intent across the brainstorm-plan handoff. When available, include spec_id only as artifact-chain trace context, not as execution progress. Use the IDs the plan supplies and do not invent ones it does not. This preserves traceability without burying signal under noise.

Phase 3-4: Quality Check and Ship It

Key Principles

Start Fast, Execute Faster

Get clarification once at the start, then execute
Don't wait for perfect understanding - ask questions and move
The goal is to finish the feature, not create perfect process

The Plan is Your Guide

Work documents should reference similar code and patterns
Load those references and follow them
Don't reinvent - match what exists

Test As You Go

Run tests after each change, not at the end
Fix failures immediately
Continuous testing prevents big surprises

Quality is Built In

Follow existing patterns
Write tests for new code
Run linting before pushing
Review every change — inline for simple additive work, full review for everything else

Ship Complete Features

Mark all tasks completed before moving on
Don't leave features 80% done
A finished feature that ships beats a perfect feature that doesn't

Common Pitfalls to Avoid

Analysis paralysis - Don't overthink, read the plan and execute
Skipping clarifying questions - Ask now, not after building wrong thing
Ignoring plan references - The plan has links for a reason
Testing at the end - Test continuously or suffer later
Forgetting to track progress - Update task status as you go or lose track of what's done
80% done syndrome - Finish the feature, don't move on early
Skipping review - Every change gets reviewed; only the depth varies
Re-scoping the plan into human-time phases - The plan's Implementation Units define the scope of execution. Do not estimate human-hours per unit, propose multi-day breakdowns, or ask the user to pick a subset of units for "this session". Agents execute at agent speed, and context-window pressure is addressed by subagent dispatch (Phase 1 Step 4), not by phased sessions. If a plan-file input is genuinely too large for a single execution, say so plainly and suggest the user return to /spec:plan to reduce scope — don't invent session phases as a workaround. For bare-prompt input, Phase 0's Large routing already handles oversized work