Test orchestration with behavior observation and iterative fixing
Runs comprehensive tests across unit, integration, and contract levels, then automatically diagnoses and fixes failures through iterative behavior observation. Use this when you need to validate your API works correctly and match the OpenAPI specification.
/plugin marketplace add claude-market/marketplace/plugin install claude-market-specforge-specforge@claude-market/marketplaceRun comprehensive tests with behavior observation, diagnose failures, and iterate until all tests pass.
The test orchestrator runs tests at multiple levels and uses behavior observation to diagnose and fix issues:
┌─────────────────────────────────────────────┐
│ /specforge:test │
└─────────────────┬───────────────────────────┘
│
┌─────────────┼─────────────┬──────────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
│ Unit │ │ Integration│ │ Contract │ │ E2E Tests │
│ Tests │ │ Tests │ │ Tests │ │ (optional) │
└────┬───┘ └─────┬────┘ └─────┬────┘ └──────┬──────┘
│ │ │ │
└────────────┴─────────────┴──────────────┘
│
▼
┌──────────────────────┐
│ Behavior Observation │
│ & Failure Diagnosis │
└──────────────────────┘
│
▼
┌──────────────────────┐
│ Iterate & Fix │
│ (Max 3 iterations) │
└──────────────────────┘
Extract the tech stack configuration from CLAUDE.md:
# Read SpecForge configuration
grep "SpecForge configuration" -A 10 CLAUDE.md
Extract:
Run unit tests for the backend using the backend plugin's testing expert:
# Get backend plugin
BACKEND_PLUGIN=$(grep "backend:" CLAUDE.md | cut -d: -f2 | tr -d ' ')
Invoke the backend plugin's test agent:
Use {backend-plugin}/test-agent skill with:
- action: "run-unit-tests"
- test_dir: "backend/tests/unit"
- coverage: true
Checks:
Technology-Specific Commands:
Rust:
cd backend && cargo test --lib
cargo tarpaulin --out Html --output-dir coverage/
TypeScript/Node:
cd backend && npm test -- --coverage
Python:
cd backend && pytest tests/unit --cov=src --cov-report=html
Go:
cd backend && go test ./... -cover -coverprofile=coverage.out
Run integration tests that test full request-to-response flows:
Use {backend-plugin}/test-agent skill with:
- action: "run-integration-tests"
- test_dir: "tests/integration"
- database_url: test database URL
Test Pattern:
Each endpoint should have integration tests:
// Example Rust integration test
#[tokio::test]
async fn test_create_order_success() {
// Setup test database
let db = setup_test_db().await;
let app = create_test_app(db.clone()).await;
// Create test user
let user = create_test_user(&db).await;
// Test request
let response = app
.post("/api/orders")
.json(&json!({
"items": [{"product_id": 1, "quantity": 2}],
"user_id": user.id
}))
.send()
.await;
// Assertions
assert_eq!(response.status(), StatusCode::CREATED);
let order: Order = response.json().await;
assert_eq!(order.user_id, user.id);
assert_eq!(order.status, OrderStatus::Pending);
}
Checks:
Verify that the running API matches the OpenAPI specification:
Use openapi-expert skill with:
- action: "contract-test"
- spec_path: "spec/openapi.yaml"
- api_url: "http://localhost:3000"
- test_report_path: "test-results/contract-tests.json"
Tools:
Schemathesis Example:
# Install schemathesis
pip install schemathesis
# Run contract tests
schemathesis run spec/openapi.yaml \
--base-url http://localhost:3000 \
--checks all \
--hypothesis-max-examples=50
Checks:
Monitor test execution to identify patterns of failure:
const testResults = {
unit: { passed: [], failed: [] },
integration: { passed: [], failed: [] },
contract: { passed: [], failed: [] },
};
// Collect all failures
const allFailures = [
...testResults.unit.failed,
...testResults.integration.failed,
...testResults.contract.failed,
];
if (allFailures.length > 0) {
// Analyze failure patterns
const patterns = analyzeFailurePatterns(allFailures);
// Categorize failures
const categories = {
typeErrors: [],
databaseErrors: [],
validationErrors: [],
businessLogicErrors: [],
networkErrors: [],
};
// Group by category for targeted fixing
for (const failure of allFailures) {
const category = categorizeFailure(failure);
categories[category].push(failure);
}
}
Common Failure Patterns:
For each category of failures, invoke the appropriate diagnostics agent:
const MAX_ITERATIONS = 3;
let iteration = 0;
let allTestsPassed = false;
while (!allTestsPassed && iteration < MAX_ITERATIONS) {
iteration++;
// Run tests
const results = await runAllTests();
if (results.allPassed) {
allTestsPassed = true;
break;
}
// Diagnose type/codegen issues
if (results.typeErrors.length > 0) {
await invokeAgent(`${codegenPlugin}/diagnostics-agent`, {
model: "sonnet",
errors: results.typeErrors,
generated_code_path: "backend/src/generated",
schema_path: "migrations/",
});
}
// Diagnose handler/business logic issues
if (results.businessLogicErrors.length > 0) {
await invokeAgent(`${backendPlugin}/handler-agent`, {
model: "sonnet",
errors: results.businessLogicErrors,
handlers_path: "backend/src/handlers",
});
}
// Diagnose database/query issues
if (results.databaseErrors.length > 0) {
await invokeAgent(`${databasePlugin}/migration-agent`, {
model: "haiku",
errors: results.databaseErrors,
migrations_path: "migrations/",
});
}
// Re-run code generation if schema changed
if (schemaChanged) {
await regenerateCode();
}
}
if (!allTestsPassed) {
throw new Error(
`Tests still failing after ${MAX_ITERATIONS} iterations. Manual intervention required.`
);
}
Generate a comprehensive test report:
# SpecForge Test Report
**Project**: my-api
**Date**: 2025-01-15
**Duration**: 45s
**Iterations**: 2
## Summary
- ✓ Unit Tests: 45/45 passed (100%)
- ✓ Integration Tests: 12/12 passed (100%)
- ✓ Contract Tests: 8/8 passed (100%)
- **Total**: 65/65 tests passed
## Coverage
- Line Coverage: 87%
- Branch Coverage: 82%
- Function Coverage: 95%
## Test Details
### Unit Tests (45 passed)
- ✓ handlers::users::create_user
- ✓ handlers::users::get_user
- ✓ handlers::orders::create_order
- ✓ handlers::orders::get_orders_by_user
- ... (41 more)
### Integration Tests (12 passed)
- ✓ POST /api/users - creates user successfully
- ✓ GET /api/users/:id - returns user
- ✓ POST /api/orders - creates order
- ✓ GET /api/users/:id/orders - returns user's orders
- ... (8 more)
### Contract Tests (8 passed)
- ✓ POST /api/users matches schema
- ✓ GET /api/users/:id matches schema
- ✓ POST /api/orders matches schema
- ... (5 more)
## Iterations
### Iteration 1
- 3 test failures
- Issues: Type mismatch in Order.created_at (expected DateTime, got String)
- Fix: Updated sql-gen type overrides
- Result: Re-ran codegen, 3 tests now pass
### Iteration 2
- All tests passed ✓
## Performance
- Average response time: 45ms
- Slowest endpoint: GET /api/users/:id/orders (120ms)
- Database queries: Average 15ms
## Recommendations
1. Add tests for error cases (400, 404, 500)
2. Add pagination tests for list endpoints
3. Add authentication/authorization tests
4. Consider adding load tests for critical endpoints
Run unit and integration tests only (skip contract tests):
/specforge:test
Run all test suites including contract tests:
/specforge:test --full
Run a specific test suite:
/specforge:test --suite unit
/specforge:test --suite integration
/specforge:test --suite contract
Run tests continuously on file changes (development mode):
/specforge:test --watch
Example GitHub Actions workflow:
name: SpecForge Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v3
- name: Install SpecForge
run: |
/plugin install specforge
/plugin install specforge-backend-rust-axum
/plugin install specforge-db-postgresql
/plugin install specforge-generate-rust-sql
- name: Run Tests
run: /specforge:test --full
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/test
- name: Upload Coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/lcov.info
Error: Type mismatch - expected i64, found String
Fix: Update codegen type overrides:
# sql-gen.toml
[sql-gen.settings]
type_overrides = { "user_id" = "i64" }
Re-run: /specforge:build
Error: FOREIGN KEY constraint failed
Fix: Ensure foreign key references exist in test setup:
// Create parent record first
let user = create_test_user(&db).await;
// Then create child record
let order = create_test_order(&db, user.id).await;
Error: Response does not match schema - missing required field 'created_at'
Fix: Update handler to include all required fields:
// Ensure all schema fields are returned
Json(OrderResponse {
id: order.id,
user_id: order.user_id,
total_cents: order.total_cents,
status: order.status,
created_at: order.created_at, // Don't forget this!
})
Error: Endpoint GET /api/users/:id returned 500, expected 200
Fix: Check handler error handling:
// Add proper error handling
let user = get_user_by_id(&db, id)
.await?
.ok_or(ApiError::NotFound)?;
Ok(Json(user))
If a frontend plugin is configured, run frontend tests:
Use {frontend-plugin}/test-expert skill with:
- action: "run-tests"
- test_dir: "frontend/tests"
Frontend Test Types:
Example E2E Test (Playwright):
test("create order flow", async ({ page }) => {
// Navigate to app
await page.goto("http://localhost:5173");
// Create order
await page.click("text=New Order");
await page.fill("[name=quantity]", "2");
await page.click("text=Submit");
// Verify order created
await expect(page.locator(".order-list")).toContainText("Order #1");
});
Test orchestration ensures your SpecForge project works correctly at all levels - from unit tests to end-to-end workflows.