autopilot-codex

AutoPilot Codex

A Claude Code plugin where OpenAI Codex (gpt-5.4 / gpt-5.5) plays senior engineering manager, while Claude Code is the implementer. Codex decomposes the goal, writes acceptance criteria, reviews every diff, and decides "next" / "done"; Claude reads, writes, and runs tests.

Status: v0.1.x — full loop. Plan, automatic per-edit review, and a Stop-hook driven autopilot loop that keeps going until Codex says done. Bash safety rails block a small list of irreversible commands.

Why two models

Cross-checking. GPT and Claude have different blind spots; review by a different model catches more real issues than self-review.
Cost. The manager only emits short prompts and structured JSON. With a ChatGPT subscription it's effectively free.
Honest scope discipline. The implementer never decides what to do next, so it can't drift the spec.

Prerequisites

Codex CLI ≥ 0.101:

npm i -g @openai/codex
# or
brew install --cask codex

Login (one of):
- ChatGPT subscription (recommended): codex login — OAuth flow. This plugin does not require OPENAI_API_KEY.
- API key: printenv OPENAI_API_KEY | codex login --with-api-key.
Claude Code ≥ 2.x with plugins enabled.
Node.js ≥ 20.
Windows only: Git for Windows (provides bash.exe + cygpath). The hooks ship with a polyglot run-hook.cmd wrapper so the same .sh files run on macOS / Linux / Windows. Default Git path C:\Program Files\Git\bin\bash.exe — edit hooks/run-hook.cmd if Git is installed elsewhere.

Verify:

codex --version
codex login status     # "Logged in using ChatGPT" or similar
node --version

Local development

git clone https://github.com/egurin/autopilot-codex
cd autopilot-codex
npm test                       # 185 tests, runs in ~2s
claude plugin validate .       # manifest + frontmatter check

The top-level package.json is for the test harness only; the plugin itself is consumed via Claude Code, not npm.

Install

Local development:

git clone https://github.com/egurin/autopilot-codex
claude --plugin-dir ./autopilot-codex
> /reload-plugins

Marketplace (once published):

> /plugin marketplace add egurin/autopilot-codex
> /plugin install autopilot-codex@autopilot-codex-marketplace

Quickstart

> /autopilot-codex:autopilot Add a /healthz endpoint that returns 200 OK with the build SHA

That single command:

Asks Codex to decompose the goal into ordered tasks (T1, T2, …) with files, steps, and acceptance criteria. Saves a markdown plan.
Hands you task T1. You implement it (TDD when applicable).
Each Edit / Write triggers a Codex review against the current task's acceptance. If fix or reject, the hook returns decision:"block" so you address the issues before continuing.
When you would otherwise stop, the Stop hook asks Codex done?. If not, it hands you the next task and blocks Stop, so the loop continues until done, abort, or the iteration cap (default 30).

Slash commands

Command	What it does
`/autopilot-codex:autopilot <goal>`	Full loop: plan + run until done.
`/autopilot-codex:codex-plan <goal>`	Plan only. Saves a markdown plan; no implementation.
`/autopilot-codex:codex-review [T<n>]`	Manually review the current (or specified) task.
`/autopilot-codex:codex-status`	Show goal, current task, base SHA, last verdict, iteration.
`/autopilot-codex:codex-next`	Show the current task contract from state.json — no Codex turn, no token spend. Quick reminder of what to implement.
`/autopilot-codex:codex-stop [reason]`	Disable the autopilot. Hooks become no-ops until `codex-plan` runs again.

How the pieces fit

AutoPilot Codex

Status: v0.1.x — full loop. Plan, automatic per-edit review, and a Stop-hook driven autopilot loop that keeps going until Codex says done. Bash safety rails block a small list of irreversible commands.

Why two models

Cross-checking. GPT and Claude have different blind spots; review by a different model catches more real issues than self-review.
Cost. The manager only emits short prompts and structured JSON. With a ChatGPT subscription it's effectively free.
Honest scope discipline. The implementer never decides what to do next, so it can't drift the spec.

Prerequisites

Codex CLI ≥ 0.101:

npm i -g @openai/codex
# or
brew install --cask codex

Login (one of):
- ChatGPT subscription (recommended): codex login — OAuth flow. This plugin does not require OPENAI_API_KEY.
- API key: printenv OPENAI_API_KEY | codex login --with-api-key.
Claude Code ≥ 2.x with plugins enabled.
Node.js ≥ 20.
Windows only: Git for Windows (provides bash.exe + cygpath). The hooks ship with a polyglot run-hook.cmd wrapper so the same .sh files run on macOS / Linux / Windows. Default Git path C:\Program Files\Git\bin\bash.exe — edit hooks/run-hook.cmd if Git is installed elsewhere.

Verify:

codex --version
codex login status     # "Logged in using ChatGPT" or similar
node --version

Local development

git clone https://github.com/egurin/autopilot-codex
cd autopilot-codex
npm test                       # 185 tests, runs in ~2s
claude plugin validate .       # manifest + frontmatter check

The top-level package.json is for the test harness only; the plugin itself is consumed via Claude Code, not npm.

Install

Local development:

git clone https://github.com/egurin/autopilot-codex
claude --plugin-dir ./autopilot-codex
> /reload-plugins

Marketplace (once published):

> /plugin marketplace add egurin/autopilot-codex
> /plugin install autopilot-codex@autopilot-codex-marketplace

Quickstart

> /autopilot-codex:autopilot Add a /healthz endpoint that returns 200 OK with the build SHA

That single command:

Asks Codex to decompose the goal into ordered tasks (T1, T2, …) with files, steps, and acceptance criteria. Saves a markdown plan.
Hands you task T1. You implement it (TDD when applicable).
Each Edit / Write triggers a Codex review against the current task's acceptance. If fix or reject, the hook returns decision:"block" so you address the issues before continuing.
When you would otherwise stop, the Stop hook asks Codex done?. If not, it hands you the next task and blocks Stop, so the loop continues until done, abort, or the iteration cap (default 30).

Slash commands

Command	What it does
`/autopilot-codex:autopilot <goal>`	Full loop: plan + run until done.
`/autopilot-codex:codex-plan <goal>`	Plan only. Saves a markdown plan; no implementation.
`/autopilot-codex:codex-review [T<n>]`	Manually review the current (or specified) task.
`/autopilot-codex:codex-status`	Show goal, current task, base SHA, last verdict, iteration.
`/autopilot-codex:codex-next`	Show the current task contract from state.json — no Codex turn, no token spend. Quick reminder of what to implement.
`/autopilot-codex:codex-stop [reason]`	Disable the autopilot. Hooks become no-ops until `codex-plan` runs again.

Popularity

What's Inside

Confidence

README

AutoPilot Codex

Why two models

Prerequisites

Local development

Install

Quickstart

Slash commands

How the pieces fit

Similar Plugins

open-agent-hub

caveman

claude-mem

llm-council-plugin

nanobanana

ui-design

More by evgenygurin

claude-mem0-project-memory

dj-music

cryptozavr

codegen-bridge

codex-bridge

AutoPilot Codex

Why two models

Prerequisites

Local development

Install

Quickstart

Slash commands

How the pieces fit

Popularity

Health & Quality

More by evgenygurin

claude-mem0-project-memory

dj-music

cryptozavr

codegen-bridge

codex-bridge

Similar Plugins

open-agent-hub

caveman

claude-mem

llm-council-plugin

nanobanana

ui-design