From python-developer
Write a BDD feature specification in Gherkin with step definitions.
npx claudepluginhub hpsgd/turtlestack --plugin python-developerThis skill is limited to using the following tools:
Write a BDD feature spec for $ARGUMENTS.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Write a BDD feature spec for $ARGUMENTS.
Before writing any Gherkin:
Read existing features — match conventions:
find . -name "*.feature" | head -20
find . -name "test_*.py" -path "*/step_defs/*" | head -20
Check existing step definitions — reuse where possible:
grep -rn "@given\|@when\|@then" --include="*.py" | head -30
Identify the domain language — what terms do product/business use? Use those, not technical terms
Every feature file follows this structure:
# tests/features/<feature-name>.feature
Feature: <Feature name in business language>
As a <role>
I want <capability>
So that <business value>
Background:
Given <common precondition shared by all scenarios>
Scenario: <Happy path — most common success case>
Given <specific precondition in business language>
When <user action in business language>
Then <expected outcome in business language>
Scenario: <Edge case — boundary or unusual input>
Given <precondition>
When <action with edge-case input>
Then <expected outcome>
Scenario: <Error case — invalid input or failed precondition>
Given <precondition>
When <action that should fail>
Then <expected error behaviour>
Scenario Outline: <Parameterised scenario for multiple inputs>
Given <precondition with <parameter>>
When <action with <input>>
Then <outcome with <expected>>
Examples:
| parameter | input | expected |
| value1 | x | y |
| value2 | a | b |
Language rules:
Given the user table has a row with id=5 and status='active'Given an active user named "Alice"When a POST request is sent to /api/login with body {"email": "..."}When Alice logs in with her email and passwordThen the response status code is 200Then Alice sees her dashboardStructure rules:
Given/When/Then is ONE statement. Use And for additional conditionsGiven establishes state — not action. It describes what IS, not what happenedWhen is ONE user action or system event. Never multiple actions in one WhenThen verifies ONE outcome. Multiple assertions use AndBackground for preconditions shared by ALL scenarios in the file — not just some"Successful login" not "Test login endpoint"Scenario completeness: Every feature MUST include at minimum:
Anti-patterns in Gherkin:
Given a user with email "test@example.com" and password "Pass123!" and role "admin" and created_at "2024-01-01" — only include details that matter for this scenarioWhen the user logs in and navigates to settings and changes their name — that's three stepsThen the user is created after When the user is created — test the EFFECT, not the action# tests/step_defs/test_<feature>.py
from pytest_bdd import given, when, then, scenarios, parsers
import pytest
# Link to feature file
scenarios('../features/<feature>.feature')
# GIVEN — establish state using fixtures
@given('an active user named "<name>"', target_fixture='user')
def active_user(name, db_session):
"""Create an active user in the test database."""
return UserFactory.create(name=name, status='active')
@given('the user has <count> pending notifications', target_fixture='notifications')
def pending_notifications(user, count, db_session):
"""Create pending notifications for the user."""
return NotificationFactory.create_batch(int(count), user=user, status='pending')
# WHEN — perform the action, capture the result
@when('the user marks all notifications as read', target_fixture='result')
def mark_all_read(user, notification_service):
"""Call the service to mark all notifications as read."""
return notification_service.mark_all_read(user.id)
# THEN — assert the outcome
@then('all notifications are marked as read')
def all_notifications_read(notifications, db_session):
"""Verify all notifications are now read."""
for notification in notifications:
db_session.refresh(notification)
assert notification.status == 'read'
@then('the unread count is <expected>')
def unread_count(user, expected, notification_service):
"""Verify the unread notification count."""
count = notification_service.get_unread_count(user.id)
assert count == int(expected)
Step definition rules:
target_fixture to pass data between steps — not global variables or module-level statetest_<feature>.py maps to <feature>.featureReusable steps go in conftest.py:
# tests/step_defs/conftest.py
from pytest_bdd import given, parsers
import pytest
# Shared fixtures
@pytest.fixture
def db_session(app):
"""Provide a database session for the test."""
session = app.db.create_session()
yield session
session.rollback()
session.close()
@pytest.fixture
def notification_service(db_session):
"""Provide the notification service."""
return NotificationService(db_session)
# Shared steps — reusable across feature files
@given('an active user named "<name>"', target_fixture='user')
def active_user(name, db_session):
return UserFactory.create(name=name, status='active')
@given('an admin user named "<name>"', target_fixture='admin')
def admin_user(name, db_session):
return UserFactory.create(name=name, role='admin')
Reuse rules:
conftest.pytest_<feature>.pyparsers.parse for typed parameters: @given(parsers.parse('a user with {count:d} items')) for integer parsing# tests/factories.py
import factory
from myapp.models import User, Notification
class UserFactory(factory.Factory):
class Meta:
model = User
id = factory.LazyFunction(uuid4)
name = factory.Faker('name')
email = factory.Faker('email')
status = 'active'
role = 'user'
created_at = factory.LazyFunction(lambda: datetime.now(UTC))
class NotificationFactory(factory.Factory):
class Meta:
model = Notification
id = factory.LazyFunction(uuid4)
user = factory.SubFactory(UserFactory)
message = factory.Faker('sentence')
status = 'pending'
created_at = factory.LazyFunction(lambda: datetime.now(UTC))
Factory rules:
factory.SubFactory for relationshipsfactory.LazyFunction for dynamic defaults (UUIDs, timestamps)create_batch(n) for generating multiple instancesFor logic-heavy features, add Hypothesis property-based tests alongside BDD scenarios:
# tests/test_<feature>_properties.py
from hypothesis import given, strategies as st
from hypothesis import settings
@given(
amount=st.decimals(min_value=0.01, max_value=10000, places=2),
currency=st.sampled_from(['USD', 'EUR', 'GBP']),
)
@settings(max_examples=200)
def test_conversion_roundtrip_preserves_value(amount, currency):
"""Converting to USD and back should preserve the original amount."""
usd = convert(amount, currency, 'USD')
back = convert(usd, 'USD', currency)
assert abs(back - amount) < Decimal('0.01')
@given(items=st.lists(st.builds(OrderItem, quantity=st.integers(1, 100), price=st.decimals(0.01, 1000, places=2))))
def test_order_total_equals_sum_of_line_items(items):
"""Order total should always equal the sum of item quantity * price."""
order = Order(items=items)
expected = sum(item.quantity * item.price for item in items)
assert order.total == expected
Property-based rules:
max_examples=200 is a good default — enough to find edge cases, fast enough for CIst.integers() for prices# Run all BDD tests
pytest tests/ -v --tb=short
# Run a specific feature
pytest tests/step_defs/test_notifications.py -v
# Run with coverage
pytest tests/ --cov=myapp --cov-report=term-missing
Given a row in the users table is infrastructure, not behaviourWhen the user logs in and creates a post is two scenariosconftest.pyDeliver:
tests/features/<feature>.feature) with happy path, edge case, and error scenariostests/step_defs/test_<feature>.py) with factories and fixturesconftest.py if reusable/python-developer:write-schema — when the feature involves configuration or data contracts, write the schema after the feature spec defines the behaviour.