Help us improve
Share bugs, ideas, or general feedback.
From arc
Adds characterization tests to legacy or under-tested code before refactoring. Captures current behavior through public interfaces and fills coverage gaps with unit, integration, or E2E tests.
npx claudepluginhub howells/arc --plugin arcHow this skill is triggered — by the user, by Claude, or both
Slash command
/arc:testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<tool_restrictions>
Enforces test-driven development: write a failing test before implementing code. Use for new logic, bug fixes, or behavior changes.
Creates and manages unit and integration tests by analyzing codebase, auto-detecting test frameworks, and generating tests that follow project conventions.
Writes and runs unit, integration, e2e, performance, and contract tests to verify code functionality.
Share bugs, ideas, or general feedback.
<tool_restrictions>
AskUserQuestion — Preserve the one-question-at-a-time interaction pattern. In Claude Code, use the tool. In Codex, ask one concise plain-text question at a time unless a structured question tool is actually available in the current mode. Do not narrate missing tools or fallbacks to the user.EnterPlanMode — BANNED. Do NOT call this tool. This skill has its own structured testing workflow. Execute it directly.ExitPlanMode — BANNED. You are never in plan mode.
</tool_restrictions><arc_runtime> This workflow requires the full Arc bundle, not a prompts-only install.
Paths in this skill use these conventions:
agents/..., references/..., disciplines/..., templates/..., scripts/..., rules/..., skills/<name>/... are Arc-owned files at the plugin root. Resolve the plugin root from this skill's filesystem location — it's the directory containing agents/ and skills/../... is local to this skill's directory..ruler/..., docs/..., src/..., or any project-relative path refers to the user's project repository.
</arc_runtime>Backfill focused tests around existing code before a risky change. The goal is not "more tests" in the abstract; it is a trustworthy safety net around behavior that must survive a refactor, migration, or bug fix.
Use this skill when:
Do not use this skill as the normal new-feature workflow. For new work, use /arc:implement or a dedicated TDD skill so RED/GREEN/REFACTOR remains the governing loop.
<required_reading> Read before testing:
references/testing-patterns.md — Test philosophy, vitest/playwright patternsreferences/testing-anti-patterns.md — What weak or misleading tests look likerules/testing.md — Arc testing conventionsdisciplines/change-impact-testing.md — Blast radius analysis for code changesreferences/llm-api-testing.md — If testing LLM integrationsreferences/maintainability-review.md — If tests are being added before decomposing a god file or tangled modulereferences/complexity-optimization.md — If tests are being added before optimizing algorithmic complexity, rendering churn, or N+1 behavior
</required_reading>Use specialist agents only when the slice is large enough to justify delegation:
| Agent | Model | Purpose | Framework |
|---|---|---|---|
unit-test-writer | sonnet | Characterize pure functions, hooks, or isolated components | vitest |
integration-test-writer | sonnet | Characterize API, auth, state, and component integration behavior | vitest + MSW |
e2e-test-writer | opus | Characterize critical browser journeys | Playwright |
test-runner | haiku | Run unit/integration suites and analyze failures | vitest |
e2e-runner | opus | Run Playwright, inspect screenshots/traces, iterate on failures | Playwright |
<rules_context> Check for project testing rules:
Use Glob tool: .ruler/testing.md
If it exists, read it for MUST/SHOULD/NEVER constraints.
Detect test framework:
| File | Framework |
|---|---|
vitest.config.* | vitest |
jest.config.* | jest |
playwright.config.* | Playwright |
package.json scripts | Project-specific test commands |
</rules_context>
Ask one question only if the target is unclear:
AskUserQuestion:
question: "What existing code or behavior needs a safety net before we change it?"
header: "Test Target"
Then identify:
Gather evidence before writing tests:
Do not silently fix production behavior during baseline work. If you discover an obvious bug, capture it as either:
List behavior in terms of callers or users, not internal implementation details:
## Safety Net: [Target]
### Planned Change
- [Refactor / bug fix / migration / cleanup]
### Public Interfaces
- [Function/component/API route/page/CLI command]
### Current Observable Behavior
| Behavior | Evidence | Risk |
| ---------- | ---------------------------------------------- | ----------------- |
| [behavior] | [code path, existing test, manual observation] | [high/medium/low] |
### Test Slices
| Slice | Level | Why this level |
| -------------- | ---------------------- | ---------------------- |
| [one behavior] | [unit/integration/e2e] | [fastest useful proof] |
For each slice:
If existing code is hard to test:
Mocks are acceptable for true boundaries: network, time, filesystem, database, auth providers, payment providers, and external LLM APIs. Prefer real code inside the project boundary.
Run checks in widening order:
When E2E output is verbose or flaky, dispatch e2e-runner with the exact test file and failure evidence.
End with a concise report:
## Safety Net Result
**Target:** [code/feature]
**Reason:** [refactor/bug fix/legacy coverage/launch risk]
**Tests added:** [files]
**Behavior characterized:**
- [behavior]
**Verification:**
- [command] — [pass/fail]
**Remaining risk:**
- [untested behavior or reason it was deferred]
**Ready for next change:** [yes/no]
| Level | Use when | Avoid when |
|---|---|---|
| Unit | Pure functions, deterministic formatting, isolated hooks, small state transitions | Behavior depends on routing, browser, API, auth, or multiple components |
| Integration | Component + state, API routes, auth states, form submissions, data adapters | A single pure function is enough or only a real browser proves it |
| E2E | Critical user journeys, auth flows, checkout/signup, routing/browser behavior | The behavior can be proven faster below the browser |
| Feature Type | First Useful Backfill | Notes |
|---|---|---|
| Utility functions | Unit | Cover edge cases and invariants through exported functions |
| UI components | Integration | Prefer user-visible behavior over snapshots |
| Forms | Integration | Add E2E only for critical end-to-end flows |
| API routes | Integration | Exercise request/response behavior and error paths |
| Auth flows | Integration + selective E2E | Mock provider states below browser; use real/browser flow sparingly |
| Checkout/payment | Integration + E2E | Mock external provider below browser; keep one critical browser path |
| LLM integrations | Unit/integration with fixtures | Avoid live calls unless explicitly required |
Use this only when auth behavior is part of the safety net.
Integration tests:
useAuth and useUser hooks.getToken for API calls.E2E tests:
tests/auth.setup.ts for login flow.playwright/.auth/user.json.storageState in playwright.config.ts.Common issues:
ClerkProvider instead of hooks.isLoaded: false state.getToken mock.Integration tests:
getUser from @workos-inc/authkit-nextjs.organizationId, role, and permissions.E2E tests:
/api/auth/test-login for faster auth in test environments only.Common issues:
organizationId in org-level features.For faster E2E tests, create a test-only auth endpoint:
// app/api/auth/test-login/route.ts
// ONLY available in test/development
export async function POST(request: Request) {
if (process.env.NODE_ENV === "production") {
return new Response("Not found", { status: 404 });
}
// Create session directly without SSO flow
}
Tests must fail fast. Never:
Playwright config:
export default defineConfig({
timeout: 30_000,
expect: {
timeout: 5_000,
},
use: {
actionTimeout: 10_000,
},
});
<success_criteria> The safety-net pass is complete when: