Red-Green-Refactor cycle for test-first development. Write failing test, implement minimal code, refactor safely. Use when developing new features or fixing bugs in test-driven projects.
From testingnpx claudepluginhub sethdford/claude-skills --plugin engineer-testingThis skill is limited to using the following tools:
examples/example-output.mdtemplate.mdDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Disciplined test-first development with the Red-Green-Refactor cycle. Tests shape API design; fearless refactoring becomes possible.
You are guiding the engineer through TDD. Your role is to help them:
TDD is not just about testing; it's about using tests as a design tool to improve code clarity and maintainability.
Based on Kent Beck, Bob Martin, and J.B. Rainsberger:
Before starting TDD, confirm:
Read requirements carefully. Ask clarifying questions:
Write down the acceptance criteria. These become test cases.
Write a single test that describes one piece of desired behavior. Make it fail.
Example (Python):
def test_user_email_validation_rejects_missing_at_symbol():
# Arrange
validator = EmailValidator()
# Act
is_valid = validator.is_valid("userexample.com")
# Assert
assert is_valid is False
Rules for this phase:
test_<subject>_<condition>_<expected_outcome>Write the simplest code that passes the test. No optimization, no extra features.
Example (Python):
class EmailValidator:
def is_valid(self, email: str) -> bool:
return "@" in email
This is deliberately naive. It passes the test. Now the test can guide the next iteration.
Rules for this phase:
Run the test. It should pass.
python -m pytest test_email_validator.py::test_user_email_validation_rejects_missing_at_symbol -v
If it fails, debug the implementation, not the test. The test defines correctness.
Run the full test suite. No regressions.
python -m pytest tests/
If an existing test breaks, you've changed behavior unintentionally. Fix the implementation.
Now improve the code: better names, extract methods, reduce duplication. Tests protect you.
Example: The naive implementation passes one test. Write the next test for a positive case:
def test_user_email_validation_accepts_valid_email():
validator = EmailValidator()
assert validator.is_valid("user@example.com") is True
This test fails (your naive implementation doesn't validate the domain). Now refactor:
class EmailValidator:
def is_valid(self, email: str) -> bool:
if not email or "@" not in email:
return False
local, domain = email.rsplit("@", 1)
if not local or not domain or "." not in domain:
return False
return True
Run all tests. Both pass. The implementation evolved because tests revealed missing behavior.
Start again at step 2. Each cycle improves the implementation and test coverage.
Example test sequence for email validator:
Each test adds one behavioral requirement. The implementation grows steadily, staying simple.
If a cycle takes >15 minutes:
When using this skill, deliver:
Example output structure:
## Red: Write Test
[test code showing failure]
## Green: Implement
[minimal implementation]
## Verify
- Test passes: ✓
- Suite passes: ✓
- No regressions: ✓
## Refactor (if needed)
[improved code with better names/structure]
## Next Cycle
The next test should verify [specific behavior], because [reason].
Feature: User password strength validator
Requirement: Passwords must be 8+ characters, contain uppercase, lowercase, digit, and special character.
Red: Test
def test_password_rejects_too_short():
validator = PasswordValidator()
assert validator.is_strong("Pass1!") is False # 6 chars, needs 8
Green: Implement
class PasswordValidator:
def is_strong(self, password: str) -> bool:
return len(password) >= 8
Verify: Test passes. Suite passes.
Red: Test
def test_password_rejects_missing_uppercase():
validator = PasswordValidator()
assert validator.is_strong("password1!") is False # no uppercase
Green: Implement
class PasswordValidator:
def is_strong(self, password: str) -> bool:
if len(password) < 8:
return False
return any(c.isupper() for c in password)
Verify: Both tests pass.
Red: Test
def test_password_rejects_missing_lowercase():
validator = PasswordValidator()
assert validator.is_strong("PASSWORD1!") is False # no lowercase
Green: Implement
class PasswordValidator:
def is_strong(self, password: str) -> bool:
if len(password) < 8:
return False
if not any(c.isupper() for c in password):
return False
return any(c.islower() for c in password)
Verify: All three tests pass.
Red: Test
def test_password_rejects_missing_digit():
validator = PasswordValidator()
assert validator.is_strong("Password!") is False # no digit
Green: Implement
class PasswordValidator:
def is_strong(self, password: str) -> bool:
if len(password) < 8:
return False
if not any(c.isupper() for c in password):
return False
if not any(c.islower() for c in password):
return False
return any(c.isdigit() for c in password)
Verify: All four tests pass.
Red: Test
def test_password_rejects_missing_special_char():
validator = PasswordValidator()
assert validator.is_strong("Password1") is False # no special char
Green: Implement
class PasswordValidator:
SPECIAL_CHARS = "!@#$%^&*()-_=+[]{}|;:',.<>?/`~"
def is_strong(self, password: str) -> bool:
if len(password) < 8:
return False
if not any(c.isupper() for c in password):
return False
if not any(c.islower() for c in password):
return False
if not any(c.isdigit() for c in password):
return False
return any(c in self.SPECIAL_CHARS for c in password)
Verify: All five tests pass.
Notice repetition in validation checks. Extract helper:
class PasswordValidator:
SPECIAL_CHARS = "!@#$%^&*()-_=+[]{}|;:',.<>?/`~"
def is_strong(self, password: str) -> bool:
return (
self._has_min_length(password, 8)
and self._has_uppercase(password)
and self._has_lowercase(password)
and self._has_digit(password)
and self._has_special_char(password)
)
def _has_min_length(self, password: str, length: int) -> bool:
return len(password) >= length
def _has_uppercase(self, password: str) -> bool:
return any(c.isupper() for c in password)
def _has_lowercase(self, password: str) -> bool:
return any(c.islower() for c in password)
def _has_digit(self, password: str) -> bool:
return any(c.isdigit() for c in password)
def _has_special_char(self, password: str) -> bool:
return any(c in self.SPECIAL_CHARS for c in password)
Verify: All tests still pass. Refactoring complete.
Red: Test
def test_password_accepts_valid_strong_password():
validator = PasswordValidator()
assert validator.is_strong("MyPass1!") is True
Green: Already implemented; test passes immediately.
When writing tests, use these decisions:
Mistake: Write code, then write tests afterward.
Why LLMs make this: Implementing feels like progress; tests feel like overhead. Code-first pressure is high.
Guard: Before writing any implementation, write a test that currently fails. Run it; confirm failure message is clear.
Example:
EmailValidator.is_valid() implementation, then write testis_valid()Mistake: Implement feature; then write tests to verify it works.
Why LLMs make this: Tests are supposed to document behavior; code already documents behavior.
Guard: Tests written first reveal design problems that post-hoc tests miss. Make it a workflow rule.
Example:
UserService.create_user(), then write testUserService.create_user() first; implement afterMistake: Implement "production-quality" code on first pass (optimization, generalization, error handling).
Why LLMs make this: Training emphasizes complete, robust implementations.
Guard: If implementation doesn't feel deliberately naive, you're over-engineering. Write simplest code that passes test; refactor later.
Example:
"@" in email on first test; evolve with each testMistake: Write multiple tests without running; run them all at once.
Why LLMs make this: Batching feels efficient.
Guard: Run tests after each Red-Green-Refactor cycle. Rapid feedback (minutes, not hours) reveals problems immediately.
Example:
Mistake: Refactor code, then run tests later; hope nothing broke.
Why LLMs make this: Large refactors feel productive.
Guard: Before refactoring, tests must pass. After each small refactoring step (rename, extract, move), run tests. If tests fail, revert and try smaller step.
Example:
Mistake: Write tests with names like test_email() or test_password_1().
Why LLMs make this: Generating descriptive names takes extra effort.
Guard: Test name should describe the condition and expected outcome. Read the test name aloud; does it explain what's being tested?
Example:
test_email(), test_password_validation()test_email_validation_rejects_missing_at_symbol(), test_password_validator_requires_min_8_chars()Before considering a test complete, verify: