From compound-engineering
[BETA] Execute work plans with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation.
npx claudepluginhub apollostreetcompany/codex-compound --plugin compound-engineeringThis skill uses the workspace's default tool permissions.
Execute a work plan efficiently while maintaining quality and finishing features.
Executes coding work from plan documents or prompts via triage, codebase scanning, and phased implementation with beta delegation for token efficiency.
Executes coding tasks from plan documents or prompts: triages input complexity, builds task lists, implements systematically following patterns, verifies with tests. Use to ship complete features efficiently.
Executes development work from plans, tasks, or prompts with optional experimental Codex delegation for token-conserving code implementation. Beta of spec-work; manual invocation for delegation trials.
Share bugs, ideas, or general feedback.
Execute a work plan efficiently while maintaining quality and finishing features.
This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on shipping complete features by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
<input_document> #$ARGUMENTS </input_document>
Read Plan and Clarify
Implementation Units, Work Breakdown, Requirements Trace, Files, Test Scenarios, or Verification, use those as the primary source material for executionExecution note on each implementation unit โ these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.Deferred to Implementation or Implementation-Time Unknowns section โ these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-taskScope Boundaries section โ these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent workExecution noteSetup Environment
First, check the current branch:
current_branch=$(git branch --show-current)
default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')
# Fallback if remote HEAD isn't set
if [ -z "$default_branch" ]; then
default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master")
fi
If already on a feature branch (not the default branch):
[current_branch], or create a new branch?"If on the default branch, choose how to proceed:
Option A: Create a new branch
git pull origin [default_branch]
git checkout -b feature-branch-name
Use a meaningful name based on the work (e.g., feat/user-authentication, fix/email-validation).
Option B: Use a worktree (recommended for parallel development)
skill: git-worktree
# The skill will create a new branch from the default branch in an isolated worktree
Option C: Continue on the default branch
Recommendation: Use worktree if:
Create Todo List
Execution note into the task when presentPatterns to follow field before implementing โ these point to specific files or conventions to mirrorVerification field as the primary "done" signal for that taskChoose Execution Strategy
After creating the task list, decide how to execute based on the plan's size and dependency structure:
| Strategy | When to use |
|---|---|
| Inline | 1-2 small tasks, or tasks needing user interaction mid-flight |
| Serial subagents | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit โ prevents context degradation across many tasks |
| Parallel subagents | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
Subagent dispatch uses your available subagent or task spawning mechanism. For each unit, give the subagent:
After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
Task Execution Loop
For each task in priority order:
while (tasks remain):
- Mark task as in-progress
- Read any referenced files from the plan
- Look for similar patterns in codebase
- Implement following existing conventions
- Write tests for new functionality
- Run System-Wide Test Check (see below)
- Run tests after changes
- Mark task as completed
- Evaluate for incremental commit (see below)
When a unit carries an Execution note, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without an Execution note, proceed pragmatically.
Guardrails for execution posture:
System-Wide Test Check โ Before marking a task done, pause and ask:
| Question | What to do |
|---|---|
| What fires when this runs? Callbacks, middleware, observers, event handlers โ trace two levels out from your change. | Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, after_* hooks. |
| Do my tests exercise the real chain? If every dependency is mocked, the test proves your logic works in isolation โ it says nothing about the interaction. | Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact. |
| Can failure leave orphaned state? If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates? | Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent. |
| What other interfaces expose this? Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods). | Grep for the method/behavior in related classes. If parity is needed, add it now โ not as a follow-up. |
| Do error strategies align across layers? Retry middleware + application fallback + framework error handling โ do they conflict or create double execution? | List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises. |
When to skip: Leaf-node changes with no callbacks, no state persistence, no parallel interfaces. If the change is purely additive (new helper method, new view partial), the check takes 10 seconds and the answer is "nothing fires, skip."
When this matters most: Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.
Incremental Commits
After completing each task, evaluate whether to create an incremental commit:
| Commit when... | Don't commit when... |
|---|---|
| Logical unit complete (model, service, component) | Small part of a larger unit |
| Tests pass + meaningful progress | Tests failing |
| About to switch contexts (backend โ frontend) | Purely scaffolding with no behavior |
| About to attempt risky/uncertain changes | Would need a "WIP" commit message |
Heuristic: "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."
If the plan has Implementation Units, use them as a starting guide for commit boundaries โ but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
Commit workflow:
# 1. Verify tests pass (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.
# 2. Stage only files related to this logical unit (not `git add .`)
git add <files related to this logical unit>
# 3. Commit with conventional message
git commit -m "feat(scope): description of this unit"
Handling merge conflicts: If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.
Note: Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
Follow Existing Patterns
Test Continuously
Simplify as You Go
After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities โ consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
Don't simplify after every single unit โ early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
If a /simplify skill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
Figma Design Sync (if applicable)
For UI work with Figma designs:
Frontend Design Guidance (if applicable)
For UI tasks without a Figma design -- where the implementation touches view, template, component, layout, or page files, creates user-visible routes, or the plan contains explicit UI/frontend/design language:
frontend-design skill before implementingTrack Progress
Run Core Quality Checks
Always run before submitting:
# Run full test suite (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.
# Run linting (per AGENTS.md)
# Use linting-agent before pushing to origin
Consider Reviewer Agents (Optional)
Use for complex, risky, or large changes. Read agents from compound-engineering.local.md frontmatter (review_agents). If no settings file, invoke the setup skill to create one.
Run configured agents in parallel with Task tool. Present findings and address critical issues.
Final Validation
Requirements Trace, verify each requirement is satisfied by the completed workDeferred to Implementation questions were noted, confirm they were resolved during executionPrepare Operational Validation Plan (REQUIRED)
## Post-Deploy Monitoring & Validation section to the PR description for every change.No additional operational monitoring required and a one-line reason.Create Commit
git add .
git status # Review what's being committed
git diff --staged # Check the changes
# Commit with conventional format
git commit -m "$(cat <<'EOF'
feat(scope): description of what and why
Brief explanation if needed.
๐ค Generated with [MODEL] via [HARNESS](HARNESS_URL) + Compound Engineering v[VERSION]
Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
EOF
)"
Fill in at commit/PR time:
| Placeholder | Value | Example |
|---|---|---|
| Placeholder | Value | Example |
| ------------- | ------- | --------- |
[MODEL] | Model name | Claude Opus 4.6, GPT-5.4 |
[CONTEXT] | Context window (if known) | 200K, 1M |
[THINKING] | Thinking level (if known) | extended thinking |
[HARNESS] | Tool running you | Claude Code, Codex, Gemini CLI |
[HARNESS_URL] | Link to that tool | https://claude.com/claude-code |
[VERSION] | plugin.json โ version | 2.40.0 |
Subagents creating commits/PRs are equally responsible for accurate attribution.
Capture and Upload Screenshots for UI Changes (REQUIRED for any UI work)
For any design changes, new views, or UI modifications, you MUST capture and upload screenshots:
Step 1: Start dev server (if not running)
bin/dev # Run in background
Step 2: Capture screenshots with agent-browser CLI
agent-browser open http://localhost:3000/[route]
agent-browser snapshot -i
agent-browser screenshot output.png
See the agent-browser skill for detailed usage.
Step 3: Upload using imgup skill
skill: imgup
# Then upload each screenshot:
imgup -h pixhost screenshot.png # pixhost works without API key
# Alternative hosts: catbox, imagebin, beeimg
What to capture:
IMPORTANT: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
Create Pull Request
git push -u origin feature-branch-name
gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
## Summary
- What was built
- Why it was needed
- Key decisions made
## Testing
- Tests added/modified
- Manual testing performed
## Post-Deploy Monitoring & Validation
- **What to monitor/search**
- Logs:
- Metrics/Dashboards:
- **Validation checks (queries/commands)**
- `command or query here`
- **Expected healthy behavior**
- Expected signal(s)
- **Failure signal(s) / rollback trigger**
- Trigger + immediate action
- **Validation window & owner**
- Window:
- Owner:
- **If no operational impact**
- `No additional operational monitoring required: <reason>`
## Before / After Screenshots
| Before | After |
|--------|-------|
|  |  |
## Figma Design
[Link if applicable]
---
[![Compound Engineering v[VERSION]](https://img.shields.io/badge/Compound_Engineering-v[VERSION]-6366f1)](https://github.com/EveryInc/compound-engineering-plugin)
๐ค Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
EOF
)"
Update Plan Status
If the input document has YAML frontmatter with a status field, update it to completed:
status: active โ status: completed
Notify User
For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in Claude Code, multi-agent workflows in Codex).
Agent teams are typically experimental and require opt-in. Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
| Agent Teams | Subagents (standard mode) |
|---|---|
| Agents need to discuss and challenge each other's approaches | Each task is independent โ only the result matters |
| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead โ use them when the inter-agent communication genuinely improves the outcome.
For plans where token conservation matters, delegate code implementation to an external delegate (currently Codex CLI) while keeping planning, review, and git operations in the current agent.
This mode integrates with the existing Phase 1 Step 4 strategy selection as a task-level modifier - the strategy (inline/serial/parallel) still applies, but the implementation step within each tagged task delegates to the external tool instead of executing directly.
| External Delegation | Standard Mode |
|---|---|
| Task is pure code implementation | Task requires research or exploration |
| Plan has clear acceptance criteria | Task is ambiguous or needs iteration |
| Token conservation matters (e.g., Max20 plan) | Unlimited plan or small task |
| Files to change are well-scoped | Changes span many interconnected files |
External delegation activates when any of these conditions are met:
Execution target: external-delegate in its Execution note (set by ce:plan-beta or ce:plan)The specific delegate tool is resolved at execution time. Currently the only supported delegate is Codex CLI. Future delegates can be added without changing plan files.
Before attempting delegation, check whether the current agent is already running inside a delegate's sandbox. Delegation from within a sandbox will fail silently or recurse.
Check for known sandbox indicators:
CODEX_SANDBOX environment variable is setCODEX_SESSION_ID environment variable is set.git/ (Codex sandbox blocks git writes)If any indicator is detected, print "Already running inside a delegate sandbox - using standard mode." and proceed with standard execution for that task.
When external delegation is active, follow this workflow for each tagged task. Do not skip delegation because a task seems "small", "simple", or "faster inline". The user or plan explicitly requested delegation.
Check availability
Verify the delegate CLI is installed. If not found, print "Delegate CLI not installed - continuing with standard mode." and proceed normally.
Build prompt โ For each task, assemble a prompt from the plan's implementation unit (Goal, Files, Approach, Conventions from compound-engineering.local.md). Include rules: no git commits, no PRs, run git status and git diff --stat when done. Never embed credentials or tokens in the prompt - pass auth through environment variables.
Write prompt to file โ Save the assembled prompt to a unique temporary file to avoid shell quoting issues and cross-task races. Use a unique filename per task.
Delegate โ Run the delegate CLI, piping the prompt file via stdin (not argv expansion, which hits ARG_MAX on large prompts). Omit the model flag to use the delegate's default model, which stays current without manual updates.
Review diff โ After the delegate finishes, verify the diff is non-empty and in-scope. Run the project's test/lint commands. If the diff is empty or out-of-scope, fall back to standard mode for that task.
Commit โ The current agent handles all git operations. The delegate's sandbox blocks .git/index.lock writes, so the delegate cannot commit. Stage changes and commit with a conventional message.
Error handling โ On any delegate failure (rate limit, error, empty diff), fall back to standard mode for that task. Track consecutive failures - after 3 consecutive failures, disable delegation for remaining tasks and print "Delegate disabled after 3 consecutive failures - completing remaining tasks in standard mode."
When some tasks are executed by the delegate and others by the current agent, use the following attribution in Phase 4:
Generated with [CURRENT_MODEL] + [DELEGATE_MODEL] via [HARNESS] and note which tasks were delegated in the PR descriptionBefore creating PR, verify:
Don't use by default. Use reviewer agents only when:
For most features: tests + linting + following patterns is sufficient.