Skill

agentic-testing

Agentic E2E testing layer (Phase 4.5). Use AFTER deterministic E2E passes and BEFORE human final review. An agent explores a goal via the stack's UI/API driver, verifies goal achievement, and crystallizes the path into a deterministic test. Stack-agnostic via project-profile adapters (web/TS base, Spring-Kotlin, Flutter).

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/junjak-ai-harness:agentic-testing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Tests enforce journeys. Agents verify goals. Explore once, regress forever.**

SKILL.md

91 lines · ~1.5k tokens

Stats

LanguageHTML

Stars2

MaintenanceExcellent

Last CommitJun 25, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Agentic Testing

Tests enforce journeys. Agents verify goals. Explore once, regress forever. Position: Phase 4.5 — after team-tester Phase 4 = PASS, before Phase 5. Complements (never replaces) deterministic E2E.

Two roles, one loop: at the late-stage checkpoint an agent (1) explores a goal to verify it (catching goal-level failures deterministic E2E missed → report to human) and (2) crystallizes the discovered path into a reusable deterministic test that joins the cheap CI layer.

Precondition (MUST — abort if unmet)

MUST read .claude/project-profile/{index.md, stack.md, testing.md}. If absent → ABORT: "Run /team-init first."
MUST read testing.md → "Agentic Testing Adapter". If missing → require /team-init --update.
Staleness: if Profile-Generated-At is far behind HEAD → require /team-init --update before running.

Adapter resolution (stack-agnostic; base = web/TS)

Resolve the adapter from testing.md's "Agentic Testing Adapter" (derived from stack.md). The pipeline is fixed; the driver, emitter, and concurrency swap per surface. The emitter reuses each stack's existing testing skill as house style (link, don't duplicate).

Surface	Explorer driver	Generator emitter (house-style skill)	Concurrency	Status
web/TS (base)	Playwright MCP (`mcp__plugin_playwright_playwright__*`)	`.spec.ts` ← `e2e-testing`	one shared browser → serialize Explorer	ready
Spring/Kotlin (backend API)	HTTP calls	`WebTestClient`/`@SpringBootTest` + Testcontainers ← `springboot-tdd`·`kotlin-testing`	stateless → true parallel (per-worker DB isolation)	ready
Flutter/Dart (mobile UI)	maestro · Patrol · mobile MCP	`integration_test` · maestro yaml	single device → serialize per device	driver-gated
Cross-journey (Flutter→Spring)	UI drive + backend assert	both layers	depends on above	later

Driver unavailable (e.g. mobile, no maestro/Patrol/MCP): do NOT run the goal — report driver unavailable (no silent skip).
CLI execution model is a non-goal (article reliability). MCP-first.

Goal derivation

Source = plan/spec acceptance criteria. Express goals as outcomes (not UI steps), risk-ordered (auth/payment/data first).

Run-at-all gate (autonomous, NOT dollar-gated)

Run a goal only if ALL hold; else log the skip reason:

VALUE: no overlap with an existing passing test for this flow.
TIME: bounded steps (~25) and target reachable.
NOISE: deterministically assertable (subjective/aesthetic → defer to web-reviewer/impeccable).

(This harness runs on a Claude Code subscription, not metered API — gate on value/time/noise, not cost.)

Pipeline (Explorer → Generator)

Explorer (Sonnet + adapter driver): goal → adapt → verify. Record met?, the observed path, and evidence.
Generator (Opus): crystallize the path via the emitter house-style skill → RUN the generated test → keep ONLY if green (self-repair ≤2 attempts, else DISCARD). met=false → no spec, escalate to human.

Generated tests MUST obey the emitter skill's conventions (e.g. e2e-testing: getByRole > … > getByTestId; waitForResponse/waitFor, never waitForTimeout).

Orchestration mode (standard vs ultracode)

The mode switch lives at the orchestration layer (team-workflow / team-leader) — a skill or spawned subagent cannot call the Workflow tool. agents/team-agentic-tester.md is the standard-mode executor.

selectMode(ctx):
  IF NOT workflowCallable():              RETURN STANDARD   # hard fallback
  IF NOT ultracodeActive(ctx):            RETURN STANDARD
  IF derivedGoalCount(ctx) < 2:           RETURN STANDARD   # 1 goal → fan-out buys nothing
  IF NOT targetReachable(ctx):            RETURN STANDARD
  RETURN ULTRACODE

Aspect	Standard (default/fallback)	Ultracode
Execution	single `team-agentic-tester`, goals sequential	orchestrator runs a Workflow `pipeline()` fan-out (adapter concurrency policy)
Verdict trust	single judgment	perspective-diverse verify (skeptic + criteria-judge, agree to accept) + `verification-loop` vacuity guard
Generated spec	generate→run→repair ×2, discard non-green	same; headless spec-runs fan out
Edge sweep	none	bounded completeness-critic (≤2 rounds)

Shared-driver caveat: web's Playwright MCP is one browser — under ultracode, serialize the Explorer lane (mutex); only Generator + headless runs fan out. Backend HTTP is stateless → Explorer may fan out too.

Output (extends the team-tester report)

Per goal: id, outcome, met, trustworthy (ultracode verify), green, specPath|null, skipReason|null. Sections: Verified+crystallized / Verified-not-crystallizable / Unmet (→ human escalation) / Distrusted verdicts.

Dry-run acceptance runbook (run in a real web/TS project)

/team-init → confirm testing.md has the Agentic Testing Adapter (Surface: web).
Pick one existing user-facing flow with acceptance criteria.
Standard mode: dispatch team-agentic-tester.
Confirm: (a) goal-verification report produced; (b) a generated *.spec.ts re-runs GREEN deterministically.
Negative: rename .claude/project-profile → confirm ABORT with "Run /team-init first."

agentic-testing

Popularity

Invocation

Context Preview

SKILL.md

agentic-testing

Popularity

Invocation

Context Preview

SKILL.md

Agentic Testing

Precondition (MUST — abort if unmet)

Adapter resolution (stack-agnostic; base = web/TS)

Goal derivation

Run-at-all gate (autonomous, NOT dollar-gated)

Pipeline (Explorer → Generator)

Orchestration mode (standard vs ultracode)

Output (extends the team-tester report)

See also (link, do not duplicate)

Dry-run acceptance runbook (run in a real web/TS project)

Similar Skills

Agentic Testing

Precondition (MUST — abort if unmet)

Adapter resolution (stack-agnostic; base = web/TS)

Goal derivation

Run-at-all gate (autonomous, NOT dollar-gated)

Pipeline (Explorer → Generator)

Orchestration mode (standard vs ultracode)

Output (extends the team-tester report)

See also (link, do not duplicate)

Dry-run acceptance runbook (run in a real web/TS project)

Similar Skills