claude-codex
Routes mechanical coding tasks to OpenAI Codex, cutting Anthropic token usage by 60–85% on eligible work. You don't change how you use Claude Code.
The problem it solves
Claude Sonnet 4.6 costs ~$3/M input tokens and ~$15/M output tokens. OpenAI's Codex (via gpt-5.3-codex) costs a fraction of that. For tasks like "write a unit test for this function" or "add JSDoc to this class", the output quality is virtually indistinguishable, but the cost difference is 20–40x.
The tricky part is figuring out which tasks qualify. "Write unit tests for getUserById" clearly does. "Why is my authentication middleware slow?" clearly doesn't — it needs reasoning, context, and judgment. Most tasks fall somewhere in between, and classifying them correctly in real time (before any tokens are spent) requires a scoring approach that weighs competing signals.
That's what this plugin does. A bash hook fires on every message before Claude processes it. If the task scores high enough on mechanical-work signals and low enough on reasoning signals, it gets routed to Codex instead. Claude receives the pre-computed answer and relays it, spending maybe 5% of the tokens it would have used computing the same answer itself.
How it works
The full pipeline
User types a message
│
▼
UserPromptSubmit hook fires (bash, zero Claude tokens)
│
├── Parse hook JSON input (message, cwd, session_id)
│
├── Check budget-aware threshold (is context window filling up?)
│
├── Check codex CLI availability (if missing: fall through to Claude)
│
├── classify.sh scores the prompt with weighted pattern matching
│
│ DELEGATE (score ≥ 20 AND category matched)
│ ├── codex-exec.sh picks expert persona for the category
│ ├── Detects project language from CWD (package.json, go.mod, etc.)
│ ├── Builds full prompt: expert persona + project context + task
│ ├── Runs `codex exec` with model + reasoning effort + sandbox
│ │ (120s timeout; falls through to Claude on failure)
│ ├── token-tracker.sh logs the delegation (fire-and-forget)
│ └── Hook returns additionalContext with Codex output + relay instructions
│
│ CLAUDE (score ≤ -20, or UNSURE between -20 and 20)
│ └── Fall through: Claude handles the task normally
│
▼
Claude receives additionalContext (if DELEGATE) or the raw message (otherwise)
│
▼
[DELEGATE path]: Claude reads the Codex answer and relays it with [via Codex · category] prefix
[CLAUDE path]: Claude answers normally
Why the hook approach works
Claude Code's UserPromptSubmit hook runs before the AI model is invoked. It receives the raw message as JSON and can inject additionalContext — extra text prepended to what Claude sees.
Classification runs entirely in bash, spending zero tokens. Codex execution runs outside the Claude inference loop. Claude's only job on delegated tasks is to clean up Markdown and add the attribution prefix: roughly 50 output tokens instead of 500–2000.
If the hook fails for any reason (codex not installed, timeout, network error), it exits without output. Claude Code treats an empty hook response as "no intervention" and falls through to Claude normally.
The routing algorithm
Scoring mechanics
scripts/classify.sh assigns a numeric score to every prompt. Positive scores push toward delegation; negative scores push toward Claude.
DELEGATE if: score ≥ 20 AND a category was matched
CLAUDE if: score ≤ -20
UNSURE otherwise: Claude handles it (conservative fallback)
The UNSURE zone between -20 and 20 intentionally keeps borderline tasks with Claude. Routing a debugging task to Codex wastes money on a bad answer; keeping a test-writing task with Claude just costs a bit more. When in doubt, the system keeps things with Claude.
Positive signals (add to score)