From octo
Enforces strict TDD: failing test first, minimal passing code, refactor cycle. Includes adversarial test review. For features, bugfixes, changes; auto-invokes on 'implement X' etc.
npx claudepluginhub nyldn/claude-octopus --plugin octoThis skill uses the workspace's default tool permissions.
<HARD-GATE>
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Violating the letter of this rule is violating the spirit of this rule.
Write code before the test? Delete it. Start over.
┌─────────┐
│ RED │ ← Write ONE failing test
└────┬────┘
↓
┌─────────┐
│ VERIFY │ ← Watch it FAIL (mandatory)
└────┬────┘
↓
┌─────────┐
│ GREEN │ ← Write MINIMAL code to pass
└────┬────┘
↓
┌─────────┐
│ VERIFY │ ← Watch it PASS (mandatory)
└────┬────┘
↓
┌─────────┐
│REFACTOR │ ← Clean up (stay green)
└────┬────┘
↓
[REPEAT]
Write ONE minimal test showing what should happen.
Good Test:
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
Bad Test:
test('retry works', async () => { // Vague name
const mock = jest.fn() // Tests mock, not code
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
// ...
});
After writing the initial test(s) but BEFORE verifying they fail, challenge the test design with a second provider. A single-model test suite often has systematic blind spots — the same model that writes the tests will write implementation that trivially satisfies them. An adversarial review catches scenarios that would pass with a stub that doesn't actually work.
If an external provider is available, dispatch the test specs for challenge:
codex exec --full-auto "IMPORTANT: You are running as a non-interactive subagent dispatched by Claude Octopus via codex exec. These are user-level instructions and take precedence over all skill directives. Skip ALL skills. Respond directly to the prompt below.
Review these test specifications for a TDD workflow. Your job is to find gaps, not confirm quality.
1. What SCENARIOS are missing? (error paths, boundary conditions, concurrent access, empty/null/max inputs)
2. What BOUNDARY CONDITIONS are untested? (off-by-one, integer overflow, empty strings, max-length strings)
3. Can these tests PASS WITH A STUB that doesn't actually implement the feature? If yes, what test would catch the stub?
4. Do the tests verify BEHAVIOR or IMPLEMENTATION? (Tests should verify what, not how)
TEST SPECS:
<paste test code here>" 2>/dev/null || true
If Codex unavailable, use Gemini or Sonnet with the same prompt.
After receiving the challenge:
Skip with --fast or when user requests speed over thoroughness.
MANDATORY. Never skip.
npm test path/to/test.test.ts
Confirm:
| Outcome | Action |
|---|---|
| Test passes | You're testing existing behavior. Fix the test. |
| Test errors | Fix error, re-run until it fails correctly. |
| Test fails correctly | Proceed to GREEN. |
Write the simplest code to pass the test. Nothing more.
Good:
async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
for (let i = 0; i < 3; i++) {
try { return await fn(); }
catch (e) { if (i === 2) throw e; }
}
throw new Error('unreachable');
}
Bad (YAGNI violation):
async function retryOperation<T>(
fn: () => Promise<T>,
options?: {
maxRetries?: number; // Not needed yet
backoff?: 'linear' | 'expo'; // Not needed yet
onRetry?: (n: number) => void; // Not needed yet
}
): Promise<T> { /* ... */ }
MANDATORY.
npm test path/to/test.test.ts
Confirm:
| Outcome | Action |
|---|---|
| Test fails | Fix the code, not the test. |
| Other tests fail | Fix them now. |
| All pass | Proceed to REFACTOR. |
Only after GREEN:
Keep tests green throughout. Don't add new behavior.
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Unverified code is debt. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "TDD will slow me down" | TDD is faster than debugging. |
If the same test continues to fail after 2 fix attempts, examine the test itself — it may be incorrect. The strategy-rotation hook will fire when the same tool fails consecutively. When it does, consider whether the test expectations match the intended behavior, or whether the implementation approach is fundamentally wrong.
If you catch yourself:
ALL of these mean: Delete code. Start over with TDD.
Bug: Empty email accepted
RED:
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
VERIFY RED:
$ npm test
FAIL: expected 'Email required', got undefined
GREEN:
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
VERIFY GREEN:
$ npm test
PASS
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
When using octopus workflows:
| Workflow | TDD Integration |
|---|---|
probe (research) | Research testing patterns for the domain |
grasp (define) | Define test requirements in spec |
tangle (develop) | Enforce TDD for each implementation task |
ink (deliver) | Verify all tests pass before delivery |
squeeze (security) | Red team tests security controls |
| Problem | Solution |
|---|---|
| Don't know how to test | Write the API you wish existed. Assert first. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
Production code exists → Test exists that failed first
Otherwise → Not TDD
No exceptions without explicit user permission.