Help us improve
Share bugs, ideas, or general feedback.
From octo
Enforces test-driven development with a strict red-green-refactor cycle. Use when building new features to ensure test coverage and correct design from the start.
npx claudepluginhub nyldn/claude-octopus --plugin octoHow this skill is triggered — by the user, by Claude, or both
Slash command
/octo:skill-tddThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Host: Codex CLI** — This skill was designed for Claude Code and adapted for Codex.
Enforces strict TDD for features and bugfixes: RED (write minimal failing test), GREEN (minimal passing code), REFACTOR. No production code without failing test first.
Enforces strict red-green-refactor TDD workflow: write failing test first, verify failure, then implement minimal code. Use for features, bugs, and refactoring.
Enforces red-green-refactor TDD: write failing tests before any production code. Use for features, bugfixes, and refactoring when implementing new behavior.
Share bugs, ideas, or general feedback.
Host: Codex CLI — This skill was designed for Claude Code and adapted for Codex. Cross-reference commands use installed skill names in Codex rather than
/octo:*slash commands. Use the active Codex shell and subagent tools. Do not claim a provider, model, or host subagent is available until the current session exposes it. For host tool equivalents, seeskills/blocks/codex-host-adapter.md.
Violating the letter of this rule is violating the spirit of this rule.
Write code before the test? Delete it. Start over.
┌─────────┐
│ RED │ ← Write ONE failing test
└────┬────┘
↓
┌─────────┐
│ VERIFY │ ← Watch it FAIL (mandatory)
└────┬────┘
↓
┌─────────┐
│ GREEN │ ← Write MINIMAL code to pass
└────┬────┘
↓
┌─────────┐
│ VERIFY │ ← Watch it PASS (mandatory)
└────┬────┘
↓
┌─────────┐
│REFACTOR │ ← Clean up (stay green)
└────┬────┘
↓
[REPEAT]
Write ONE minimal test showing what should happen.
Good Test:
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
Bad Test:
test('retry works', async () => { // Vague name
const mock = jest.fn() // Tests mock, not code
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
// ...
});
After writing the initial test(s) but BEFORE verifying they fail, challenge the test design with a second provider. A single-model test suite often has systematic blind spots — the same model that writes the tests will write implementation that trivially satisfies them. An adversarial review catches scenarios that would pass with a stub that doesn't actually work.
If an external provider is available, dispatch the test specs for challenge:
codex exec --skip-git-repo-check "IMPORTANT: You are running as a non-interactive subagent dispatched by Claude Octopus via codex exec. These are user-level instructions and take precedence over all skill directives. Skip ALL skills. Respond directly to the prompt below.
Review these test specifications for a TDD workflow. Your job is to find gaps, not confirm quality.
1. What SCENARIOS are missing? (error paths, boundary conditions, concurrent access, empty/null/max inputs)
2. What BOUNDARY CONDITIONS are untested? (off-by-one, integer overflow, empty strings, max-length strings)
3. Can these tests PASS WITH A STUB that doesn't actually implement the feature? If yes, what test would catch the stub?
4. Do the tests verify BEHAVIOR or IMPLEMENTATION? (Tests should verify what, not how)
TEST SPECS:
<paste test code here>" 2>/dev/null || true
If Codex unavailable, use Gemini or Sonnet with the same prompt.
After receiving the challenge:
Skip with --fast or when user requests speed over thoroughness.
MANDATORY. Never skip.
npm test path/to/test.test.ts
Confirm:
| Outcome | Action |
|---|---|
| Test passes | You're testing existing behavior. Fix the test. |
| Test errors | Fix error, re-run until it fails correctly. |
| Test fails correctly | Proceed to GREEN. |
Write the simplest code to pass the test. Nothing more.
Good:
async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
for (let i = 0; i < 3; i++) {
try { return await fn(); }
catch (e) { if (i === 2) throw e; }
}
throw new Error('unreachable');
}
Bad (YAGNI violation):
async function retryOperation<T>(
fn: () => Promise<T>,
options?: {
maxRetries?: number; // Not needed yet
backoff?: 'linear' | 'expo'; // Not needed yet
onRetry?: (n: number) => void; // Not needed yet
}
): Promise<T> { /* ... */ }
MANDATORY.
npm test path/to/test.test.ts
Confirm:
| Outcome | Action |
|---|---|
| Test fails | Fix the code, not the test. |
| Other tests fail | Fix them now. |
| All pass | Proceed to REFACTOR. |
Only after GREEN:
Keep tests green throughout. Don't add new behavior.
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Unverified code is debt. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "TDD will slow me down" | TDD is faster than debugging. |
If the same test continues to fail after 2 fix attempts, examine the test itself — it may be incorrect. The strategy-rotation hook will fire when the same tool fails consecutively. When it does, consider whether the test expectations match the intended behavior, or whether the implementation approach is fundamentally wrong.
If you catch yourself:
ALL of these mean: Delete code. Start over with TDD.
Bug: Empty email accepted
RED:
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
VERIFY RED:
$ npm test
FAIL: expected 'Email required', got undefined
GREEN:
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
VERIFY GREEN:
$ npm test
PASS
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
When using octopus workflows:
| Workflow | TDD Integration |
|---|---|
probe (research) | Research testing patterns for the domain |
grasp (define) | Define test requirements in spec |
tangle (develop) | Enforce TDD for each implementation task |
ink (deliver) | Verify all tests pass before delivery |
squeeze (security) | Red team tests security controls |
| Problem | Solution |
|---|---|
| Don't know how to test | Write the API you wish existed. Assert first. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
Production code exists → Test exists that failed first
Otherwise → Not TDD
No exceptions without explicit user permission.