Scaffolded end-to-end test generation for integration seams — API endpoints, UI flows, and service boundaries — with framework detection and a durable coverage summary.
From workflow-orchestrationnpx claudepluginhub mikecubed/agent-orchestration --plugin workflow-orchestrationThis skill uses the workspace's default tool permissions.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Use this skill when a developer needs to generate end-to-end or integration tests for code that crosses service, API, or UI boundaries. It detects the project's E2E framework, identifies integration seams, and scaffolds test files covering happy paths, error paths, and auth boundaries.
This skill generates E2E and integration tests only. Unit test generation and test-driven development workflows are the domain of tdd-check and the TDD gate — do not use this skill for unit tests. The boundary is clear: if the test exercises a single function or class in isolation, it is a unit test; if it exercises an interaction across components, endpoints, or services, it belongs here.
Persistent team, squad, or fleet-style long-lived orchestration is out of scope for this skill. Use a separate orchestration layer if persistent coordination is needed.
Activate when the developer asks for things like:
Also activate when:
parallel-implementation-loop or pr-review-resolution-loop has completed and the developer wants to verify integration boundaries;map-codebase has identified integration seams that lack test coverage.Do not activate for:
tdd-check);systematic-debugging);incident-rca).Before you start, identify:
.agent/e2e-scaffold-summary.md).If the target path is ambiguous in a monorepo, prompt the developer for the target service or default to the service containing recently changed files.
Use separate roles for:
Resolve the active model for each role using this priority chain:
Project config — look for the runtime-specific config file in the current project root:
.copilot/models.yaml.claude/models.yamlRead the implementer, reviewer, and scout keys directly. If a key is absent, fall back to the baked-in default for that role.
Session cache — if models were already confirmed earlier in this session, reuse them.
Baked-in defaults — if neither config file nor session cache exists, use the defaults below, ask the developer to confirm or override once, then cache for the session.
| Runtime | Role | Default model |
|---|---|---|
| Copilot CLI | Implementer | claude-opus-4.6 |
| Copilot CLI | Reviewer | gpt-5.4 |
| Copilot CLI | Scout | claude-haiku-4.5 |
| Claude Code | Implementer | claude-opus-4.6 |
| Claude Code | Reviewer | claude-opus-4.6 |
| Claude Code | Scout | claude-haiku-4.5 |
The scout inspects project manifests and configuration files to detect the E2E or integration test framework:
If a framework is detected:
If no framework is detected:
Gate: Framework resolved
Write .agent/SESSION.md with current-phase: "framework-detection" after this phase completes.
Identify the target integration seams in the specified path:
Produce a factual brief of the identified boundaries:
Present the boundary list to the developer for confirmation before generating scaffolds. Allow the developer to add, remove, or reprioritize targets.
Write .agent/SESSION.md with current-phase: "boundary-identification" after this phase completes.
For each confirmed integration seam, generate test files with:
Follow language-specific naming conventions:
tests/{name}.e2e.ts (or .e2e.js)tests/{name}_e2e_test.gotest_{name}_e2e.pytests/{name}_e2e.rsIf the project has an existing test directory convention that differs from the defaults above, follow the project convention.
The reviewer validates each generated file for:
If a generated file fails syntactic validation, revise it. Allow up to 2 revision rounds per file before escalating.
Gate: Scaffolds valid
Write .agent/SESSION.md with current-phase: "scaffold-generation" after this phase completes.
Produce a durable scaffold summary artifact written to the confirmed output path. This artifact must contain:
"Durable" means written to a repository-appropriate sink — a committed document, a PR comment, or an issue — not only to chat. Chat-only summaries do not satisfy this requirement.
Gate: Summary produced
Write .agent/SESSION.md using the full schema defined in docs/session-md-schema.md. All five YAML frontmatter fields are required on every write:
current-task: the E2E generation target descriptioncurrent-phase: the current phase namenext-action: what happens nextworkspace: the active branch or PR referencelast-updated: current ISO-8601 timestampRequired sections: ## Decisions, ## Files Touched, ## Open Questions, ## Blockers, ## Failed Hypotheses.
Write SESSION.md after each phase gate. If the write fails, log a warning and continue.
The framework must be detected or the gap must be documented with agnostic stubs planned. The skill cannot proceed to scaffold generation without resolving the framework question.
All generated test files must be syntactically valid (parseable by the language toolchain). Up to 2 revision rounds are allowed per file. After 2 failed rounds, the file is escalated to the developer.
The durable scaffold summary artifact must be written to disk. If both the primary path and the fallback path (.agent/e2e-scaffold-summary.md) fail, the gate fails.
Before declaring E2E scaffold generation complete, confirm ALL of the following. Any failing item blocks the "scaffold complete" declaration.
If any item is FAIL: report the failing item(s) by name, state what must be done to resolve each, and do not advance past the gate.
Before stopping, ensure any partial results are preserved as a durable artifact so work is not lost. A partial summary must still list files created (if any), coverage gaps, and the reason for stopping. When stopping due to a rescue failure (e.g., framework undetectable), document the rescue attempt and its outcome.
Developer: generate E2E tests for src/api/payments/
Phase 1 — Framework detection
The scout inspects package.json and finds:
{
"devDependencies": {
"@playwright/test": "^1.44.0"
}
}
Framework detected: Playwright 1.44. Configuration at playwright.config.ts.
Phase 2 — Boundary identification
The scout produces a factual brief of the integration seams in src/api/payments/:
POST /api/payments/charge — creates a charge (requires API key);GET /api/payments/status/:id — retrieves charge status;DELETE /api/payments/refund/:id — initiates a refund (requires API key + admin role).Developer confirms all three endpoints as targets.
Phase 3 — Scaffold generation
The implementer generates tests/payments.e2e.ts:
import { test, expect } from '@playwright/test';
test.describe('POST /api/payments/charge', () => {
test('creates a charge with valid input', async ({ request }) => {
const response = await request.post('/api/payments/charge', {
data: { amount: 1000, currency: 'usd', merchantId: 'test-merchant' },
headers: { 'x-api-key': process.env.TEST_API_KEY },
});
expect(response.status()).toBe(201);
});
test('returns 400 for missing required fields', async ({ request }) => {
const response = await request.post('/api/payments/charge', {
data: {},
headers: { 'x-api-key': process.env.TEST_API_KEY },
});
expect(response.status()).toBe(400);
});
test('returns 401 without API key', async ({ request }) => {
const response = await request.post('/api/payments/charge', {
data: { amount: 1000, currency: 'usd', merchantId: 'test-merchant' },
});
expect(response.status()).toBe(401);
});
});
test.describe('GET /api/payments/status/:id', () => {
test('returns charge status for valid ID', async ({ request }) => {
const response = await request.get('/api/payments/status/charge_123');
expect(response.status()).toBe(200);
});
test('returns 404 for unknown charge ID', async ({ request }) => {
const response = await request.get('/api/payments/status/nonexistent');
expect(response.status()).toBe(404);
});
});
test.describe('DELETE /api/payments/refund/:id', () => {
test('initiates refund with admin role', async ({ request }) => {
const response = await request.delete('/api/payments/refund/charge_123', {
headers: { 'x-api-key': process.env.TEST_API_KEY, 'x-role': 'admin' },
});
expect(response.status()).toBe(200);
});
test('returns 400 for invalid refund ID', async ({ request }) => {
const response = await request.delete('/api/payments/refund/invalid-id!', {
headers: { 'x-api-key': process.env.TEST_API_KEY, 'x-role': 'admin' },
});
expect(response.status()).toBe(400);
});
test('returns 403 without admin role', async ({ request }) => {
const response = await request.delete('/api/payments/refund/charge_123', {
headers: { 'x-api-key': process.env.TEST_API_KEY, 'x-role': 'viewer' },
});
expect(response.status()).toBe(403);
});
});
Reviewer confirms: all files syntactically valid, all seams covered.
Phase 4 — Coverage summary
# E2E Scaffold Summary — src/api/payments/
**Framework:** Playwright 1.44
**Files created:**
- tests/payments.e2e.ts
**Coverage:**
| Endpoint | Happy path | Error path | Auth boundary |
|--------------------------------|------------|------------|---------------|
| POST /api/payments/charge | ✓ | ✓ (400) | ✓ (401) |
| GET /api/payments/status/:id | ✓ | ✓ (404) | — |
| DELETE /api/payments/refund/:id | ✓ | ✓ (400) | ✓ (403) |
**Total:** 3 endpoints × 2–3 scenarios = 8 test cases scaffolded.
**Gaps:** None.