From junjak-ai-harness
Agentic E2E testing layer (Phase 4.5). Use AFTER deterministic E2E passes and BEFORE human final review. An agent explores a goal via the stack's UI/API driver, verifies goal achievement, and crystallizes the path into a deterministic test. Stack-agnostic via project-profile adapters (web/TS base, Spring-Kotlin, Flutter).
How this skill is triggered — by the user, by Claude, or both
Slash command
/junjak-ai-harness:agentic-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Tests enforce journeys. Agents verify goals. Explore once, regress forever.**
Tests enforce journeys. Agents verify goals. Explore once, regress forever. Position: Phase 4.5 — after
team-testerPhase 4 = PASS, before Phase 5. Complements (never replaces) deterministic E2E.
Two roles, one loop: at the late-stage checkpoint an agent (1) explores a goal to verify it (catching goal-level failures deterministic E2E missed → report to human) and (2) crystallizes the discovered path into a reusable deterministic test that joins the cheap CI layer.
.claude/project-profile/{index.md, stack.md, testing.md}. If absent → ABORT: "Run /team-init first."testing.md → "Agentic Testing Adapter". If missing → require /team-init --update.Profile-Generated-At is far behind HEAD → require /team-init --update before running.Resolve the adapter from testing.md's "Agentic Testing Adapter" (derived from stack.md). The pipeline is fixed; the driver, emitter, and concurrency swap per surface. The emitter reuses each stack's existing testing skill as house style (link, don't duplicate).
| Surface | Explorer driver | Generator emitter (house-style skill) | Concurrency | Status |
|---|---|---|---|---|
| web/TS (base) | Playwright MCP (mcp__plugin_playwright_playwright__*) | .spec.ts ← e2e-testing | one shared browser → serialize Explorer | ready |
| Spring/Kotlin (backend API) | HTTP calls | WebTestClient/@SpringBootTest + Testcontainers ← springboot-tdd·kotlin-testing | stateless → true parallel (per-worker DB isolation) | ready |
| Flutter/Dart (mobile UI) | maestro · Patrol · mobile MCP | integration_test · maestro yaml | single device → serialize per device | driver-gated |
| Cross-journey (Flutter→Spring) | UI drive + backend assert | both layers | depends on above | later |
driver unavailable (no silent skip).Source = plan/spec acceptance criteria. Express goals as outcomes (not UI steps), risk-ordered (auth/payment/data first).
Run a goal only if ALL hold; else log the skip reason:
web-reviewer/impeccable).(This harness runs on a Claude Code subscription, not metered API — gate on value/time/noise, not cost.)
met?, the observed path, and evidence.met=false → no spec, escalate to human.Generated tests MUST obey the emitter skill's conventions (e.g. e2e-testing: getByRole > … > getByTestId; waitForResponse/waitFor, never waitForTimeout).
The mode switch lives at the orchestration layer (team-workflow / team-leader) — a skill or spawned subagent cannot call the Workflow tool. agents/team-agentic-tester.md is the standard-mode executor.
selectMode(ctx):
IF NOT workflowCallable(): RETURN STANDARD # hard fallback
IF NOT ultracodeActive(ctx): RETURN STANDARD
IF derivedGoalCount(ctx) < 2: RETURN STANDARD # 1 goal → fan-out buys nothing
IF NOT targetReachable(ctx): RETURN STANDARD
RETURN ULTRACODE
| Aspect | Standard (default/fallback) | Ultracode |
|---|---|---|
| Execution | single team-agentic-tester, goals sequential | orchestrator runs a Workflow pipeline() fan-out (adapter concurrency policy) |
| Verdict trust | single judgment | perspective-diverse verify (skeptic + criteria-judge, agree to accept) + verification-loop vacuity guard |
| Generated spec | generate→run→repair ×2, discard non-green | same; headless spec-runs fan out |
| Edge sweep | none | bounded completeness-critic (≤2 rounds) |
Shared-driver caveat: web's Playwright MCP is one browser — under ultracode, serialize the Explorer lane (mutex); only Generator + headless runs fan out. Backend HTTP is stateless → Explorer may fan out too.
Per goal: id, outcome, met, trustworthy (ultracode verify), green, specPath|null, skipReason|null. Sections: Verified+crystallized / Verified-not-crystallizable / Unmet (→ human escalation) / Distrusted verdicts.
skills/e2e-testing/SKILL.md — deterministic layer + web emitter conventionsskills/verification-loop/SKILL.md — vacuity guard (applied to "met" claims)skills/team-workflow/SKILL.md — Phase 4.5 + Orchestration Mode/team-init → confirm testing.md has the Agentic Testing Adapter (Surface: web).team-agentic-tester.*.spec.ts re-runs GREEN deterministically..claude/project-profile → confirm ABORT with "Run /team-init first."npx claudepluginhub junjak/ai-harness --plugin junjak-ai-harnessCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.