From forge
Stack-agnostic autonomous security, consistency, and quality auditor. Crawls any running web app using Playwright, runs programmatic security and accessibility scans, cross-checks UI/API/DB values, attacks forms with wild data, and produces a Confidence Letter with a scored report that forge can consume to fix issues. Authors persistent test files in the project's native Playwright flavor (TS/JS/Python/Java/.NET) for repeatable checks (zero-token execution) and uses MCP only for exploratory discovery. Trigger on: recon, audit, security check, consistency check, confidence letter, quality audit, end-to-end check, verify the app, check everything, mutation propagation, cross-page consistency, state propagation, data flow audit.
npx claudepluginhub bishwas-py/forge --plugin forgeThis skill uses the workspace's default tool permissions.
Recon is a stack-agnostic auditor that crawls any running web application end-to-end. It combines **programmatic security tools** (deterministic, repeatable) with **AI-powered crawling and analysis** (judgment, reasoning) to produce a **Confidence Letter** — a scored, structured report of every issue found.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Recon is a stack-agnostic auditor that crawls any running web application end-to-end. It combines programmatic security tools (deterministic, repeatable) with AI-powered crawling and analysis (judgment, reasoning) to produce a Confidence Letter — a scored, structured report of every issue found.
The Confidence Letter is designed to be consumed by /forge — the user runs /forge read the confidence letter and gain 100%, forge picks up every issue as a task, fixes them, and the user re-runs /recon until 100% confidence is reached.
Before executing ANY phase, determine which mode to run:
| User says | Mode | Jump to |
|---|---|---|
--per-pr, "per PR", "each PR", "all PRs", "parallel", "isolated", "each branch", "test each", "parellely" | Per-PR mode | Phase 6 ONLY — skip Phases 1-5 entirely |
| Anything else (no per-PR keywords) | Standard mode | Phase 1 → 2 → 3 → 4B → 4C → 4A → 5 |
pnpm dev or npm start — environments are managed by recon-env.shrecon-env.sh — this script handles git archive extraction, Docker Compose startup, port allocation, health checks, and teardownIf you catch yourself port-scanning or starting dev servers in per-PR mode, STOP — you are in the wrong mode.
Recon uses three approaches to verify an application. Each has a specific role — choosing the wrong one wastes tokens or misses bugs.
| Approach | When recon uses it | Token cost |
|---|---|---|
MCP browser driving (browser_navigate, browser_snapshot, browser_click, etc.) | Initial page discovery (Phase 3), exploratory form attack analysis where outcomes are unpredictable, one-off auth flows | High — every interaction is an AI round-trip |
Screenshot-based analysis (browser_take_screenshot) | Visual evidence for findings, layout checks | Medium — image processing per shot |
| Authored test files (Playwright specs run by the native runner) | All consistency checks, all mutation propagation tests, all repeatable form validation, all cross-page data verification, all real-time channel tests | Zero — native runner executes without AI tokens |
browser_snapshot to extract a value you could assert in a test file, stop and author the test instead.| Step | Tools | What happens |
|---|---|---|
| Author & run test files | Write + Bash (native runner from Phase 1.6) | Persistent regression suite, zero-token execution |
| Programmatic scans | Bash (scripts in ${CLAUDE_SKILL_DIR}/scripts/) | Security headers, accessibility, DB integrity, API contracts |
| Navigate & interact with UI (discovery only) | Playwright (MCP-driven exploration ONLY — discovery, auth, interactive diagnosis. Not used for repeatable checks) | Click through pages during Phase 3, fill auth forms, diagnose test failures |
| Real-time channel sniffing | Bash (wscat, websocat, httpx for SSE) + Playwright's page.on('websocket') | Wire-level assertion of channel events |
| Query database | Bash (psql/mysql/sqlite via docker exec or direct) | Readonly SELECT queries |
| Hit API endpoints | Bash (curl) | Replay API calls, capture raw responses |
| Read project stack | Glob, Read, Grep | Detect frameworks, configs, manifests |
Before recon runs, it needs to detect the running targets. Unlike stack-specific tools, recon auto-detects everything.
Scan common local development ports. Run using Bash:
for port in 3000 4173 5173 5174 8000 8080 8888; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port" 2>/dev/null)
[ "$STATUS" != "000" ] && echo "$STATUS -> localhost:$port is UP"
done
Read the project directory to fingerprint the stack. Use Glob to find manifest files:
package.json → Node.js (check for svelte, react, vue, next, nuxt, angular in dependencies)pyproject.toml / requirements.txt → Python (check for django, fastapi, flask)go.mod → GoCargo.toml → RustGemfile → Ruby/Railscomposer.json → PHP/Laravelpom.xml / build.gradle → Java/SpringThis informs which programmatic tools to run and how to interpret results.
Find the database by scanning Docker containers and common local ports:
# Check Docker
docker ps --format '{{.Names}} {{.Ports}}' 2>/dev/null | grep -E '5432|3306|27017|6379' || true
# Check local ports
for port in 5432 3306 27017; do
(echo > /dev/tcp/localhost/$port) 2>/dev/null && echo "localhost:$port is UP"
done
Map port to engine:
If not detected, ask the user for: engine type, connection method (docker container name or direct host:port), user, database name.
The API server is detected during the crawl from network requests, not pre-configured. However, check common patterns first:
for port in 8000 3001 4000 5000 8080; do
for path in /docs /health /api /api/health; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$port$path" 2>/dev/null)
[ "$STATUS" != "000" ] && [ "$STATUS" != "404" ] && echo "$STATUS -> localhost:$port$path"
done
done
Print the detected configuration:
Recon targets:
UI: http://localhost:5173
Stack: SvelteKit + FastAPI + Postgres
API: http://localhost:8000 (detected)
DB: postgres @ docker container "hydra-db" (port 5432)
Ask the user to confirm or correct. Do not proceed until confirmed.
After stack detection, recon determines which Playwright flavor and file convention to use for authored tests. Do not hardcode any specific extension or runner — detect from the project.
Check in this priority order:
package.json contains @playwright/test (or recon will install it) → JS/TS (Node) — runner is npx playwright testpyproject.toml / requirements.txt contains playwright or pytest-playwright → Python — runner is pytestpom.xml / build.gradle references com.microsoft.playwright → Java — runner is mvn test or gradle test*.csproj references Microsoft.Playwright → .NET — runner is dotnet testIf no binding is installed, recon picks the binding matching the detected stack from steps 1-2 (JS/TS for Node projects, Python for Python projects, etc.) and installs it.
Run a discovery command to learn the project's naming convention:
find . -type f \( -name "*.spec.*" -o -name "*.test.*" -o -name "test_*.py" -o -name "*Test.java" -o -name "*Tests.cs" \) \
-not -path "*/node_modules/*" -not -path "*/.venv/*" -not -path "*/target/*" -not -path "*/bin/*" -not -path "*/obj/*" \
| head -50
Tally the extensions and naming patterns found. The dominant convention wins. If there are no existing tests, fall back to the language default.
Recognised conventions:
| Language | Conventions to detect | Default if none |
|---|---|---|
| JS/TS | .spec.ts, .spec.js, .spec.mjs, .test.ts, .test.js, .e2e.ts | .spec.ts (or .spec.js if project is plain JS) |
| Python | test_*.py | test_*.py |
| Java | *Test.java | *Test.java |
| .NET | *Tests.cs | *Tests.cs |
Check for existing test directories in this order — first match wins:
tests/recon/, e2e/recon/, tests/e2e/recon/tests/e2e/, e2e/, __tests__/, playwright/ → create a recon/ subdirectory inside ittests/recon/playwright.config.ts, playwright.config.js, playwright.config.mjsconftest.py with playwright fixtures, or pytest.ini / pyproject.toml with [tool.pytest.ini_options]playwright config in pom.xml/build.gradleplaywright.config in the test projectIf none exists, recon generates a minimal one matching the detected language binding.
Print the detected profile alongside the targets:
Recon test profile:
Language: TypeScript (Node)
Runner: playwright test
Convention: *.spec.ts
Directory: tests/recon/
Config: playwright.config.ts (existing)
Or for Python:
Recon test profile:
Language: Python
Runner: pytest
Convention: test_*.py
Directory: tests/recon/
Config: conftest.py (will generate)
The user confirms before recon proceeds. From this point on, all references to "test files" in subsequent phases use this detected profile — recon must NOT assume any specific extension or runner.
Fingerprint real-time technology used by the application. Recon detects this by scanning dependencies and, later, observing network traffic during Phase 3.
Search the project's frontend dependencies for real-time client libraries:
socket.io-client → Socket.IOws, isomorphic-ws → raw WebSocket@supabase/supabase-js (with realtime usage) → Supabase Realtimefirebase (with onSnapshot/onValue) → Firestore/RTDB listeners@apollo/client + subscription imports, urql + subscription exchange → GraphQL subscriptionsphoenix → Phoenix Channels@microsoft/signalr → SignalRpusher-js, ably, pubnub → managed pub/sub serviceseventsource or native EventSource usage → SSESearch the project's backend dependencies:
socket.io, ws, uWebSockets.js (Node)channels, channels-redux, django-channels (Python/Django)fastapi-websocket-pubsub, starlette.websockets (FastAPI/Starlette)flask-socketio (Flask)actioncable (Rails)phoenix (Elixir)tokio-tungstenite, axum's ws (Rust)If real-time dependencies are detected, print:
Real-time channels detected:
Type: Socket.IO
Frontend: socket.io-client (package.json)
Backend: socket.io (package.json)
Note: Network observation during Phase 3 will capture endpoint and events
If no real-time dependencies are found, print: "No real-time channel dependencies detected. Phase 4C will be skipped unless Phase 3 network observation reveals WebSocket/SSE traffic."
Network observation happens during Phase 3 — see Phase 3 step 5.
Run deterministic scripts BEFORE the AI crawl. These produce facts, not opinions. Execute each script from ${CLAUDE_SKILL_DIR}/scripts/ using Bash.
bash "${CLAUDE_SKILL_DIR}/scripts/security-scan.sh" "<ui_url>"
Also run against the API URL if different:
bash "${CLAUDE_SKILL_DIR}/scripts/security-scan.sh" "<api_url>"
Captures: HTTP security headers, CORS configuration, cookie flags, SSL/TLS status, open redirects, information disclosure.
bash "${CLAUDE_SKILL_DIR}/scripts/accessibility-audit.sh" "<ui_url>"
Runs axe-core, pa11y, and Lighthouse if installed. Falls back to Playwright-based checks if no tools available.
bash "${CLAUDE_SKILL_DIR}/scripts/db-integrity.sh" "<engine>" "<container>" "<user>" "<dbname>"
Runs: foreign key orphan detection, NULL checks on key columns, duplicate detection, table row counts. Supports postgres, mysql, sqlite.
bash "${CLAUDE_SKILL_DIR}/scripts/api-contract.sh" "<api_url>"
Discovers OpenAPI spec automatically, validates endpoint liveness, checks for undocumented endpoints.
Only run if real-time channels were detected in Phase 1.7.
bash "${CLAUDE_SKILL_DIR}/scripts/realtime-hygiene.sh" "<channel_endpoint>" "<auth_token>"
Checks:
These produce PASS/FAIL findings independent of AI crawling.
Capture ALL script output. These results feed into the final Confidence Letter. Record each [PASS], [FAIL], [WARN], [CRITICAL], [INFO], and [SKIP] finding.
After programmatic scans, crawl the running UI to build a complete page map. This phase uses Playwright exclusively.
Navigate to the root URL using browser_navigate. Take a browser_snapshot and extract every link target:
<a href="..."> valuesjavascript:void(0) hrefsFor each unvisited route:
browser_navigate to the URLbrowser_snapshot to capture the pagebrowser_click then browser_snapshotRepeat until no new routes are discovered.
Revisit each page and look for hidden routes behind interactive elements:
Detect auth walls by checking for:
/login, /signin, /authWhen detected:
browser_fill_form + browser_click)Print discovered routes:
Discovered [N] pages:
1. / (public)
2. /login (public)
3. /dashboard (authenticated)
4. /items (authenticated)
5. /items/:id (authenticated)
Ask user to confirm, add missing pages, or exclude pages.
During the Phase 3 crawl, capture real-time traffic from browser_network_requests:
ws:// or wss:// upgrades → WebSocket/Socket.IO endpointsAccept: text/event-stream or responses with Content-Type: text/event-stream → SSEFor each detected WebSocket, log the endpoint URL and any events observed during the crawl. Update the real-time profile from Phase 1.7:
Real-time channels detected (updated after crawl):
Type: Socket.IO
Endpoint: ws://localhost:8000/socket.io/
Events seen: user.role.changed, presence.update, notification.new
Subscribers: /, /admin/stats, /reports/overview
If Phase 1.7 found no dependencies but Phase 3 observes WebSocket/SSE traffic, Phase 4C is now enabled.
Mutation propagation tests are end-to-end behavioral tests that simulate real multi-actor workflows. Recon checks what happens NEXT across every actor and surface affected — not just whether one value updates on one page. These are the highest-impact, hardest-to-spot defects in any web app. Recon authors them BEFORE plain consistency tests (Phase 4A).
A mutation propagation test verifies that a state change by one actor produces the correct downstream effect for every other actor on every affected surface — AND verifies that actors who should NOT see the change don't.
Pattern: setup-actors → baseline-everywhere → trigger-action → verify-direct-effect → verify-propagated-effects → verify-non-effects → cleanup
- setup-actors: one Playwright
BrowserContextper actor (sender, recipient, observer, admin, outsider). Real concurrent sessions.- baseline-everywhere: capture every relevant surface for every actor.
- trigger-action: actor A performs the action.
- verify-direct-effect: A sees their own action reflected.
- verify-propagated-effects: B, C, D see what they should — right place, right time, right form.
- verify-non-effects: actors who should NOT see it don't (privacy, permissions, tenant isolation).
- cleanup: revert so the test is idempotent.
For each mutation source identified from Phase 3's page map, recon answers four questions and writes tests for each:
To discover linkages programmatically:
mutation-source (has forms / actions) or display-only\d table_name for postgres, DESCRIBE table_name for mysql, equivalent for others) to identify which columns the mutation writes togrep -rn "column_name" <api_dir>(mutation page, actor, action, [affected actors × surfaces], [excluded actors])Recon does NOT ask the user whether to write propagation tests. Every mutation source gets, at minimum:
The user opts OUT, not in. Recon's failure mode is under-authoring, not over-authoring.
Recon picks the template matching Phase 1.6's detected profile. If Phase 1.6 detected a non-TS profile, recon translates these into the project's language using equivalent Playwright APIs (Python sync_playwright, Java/.NET equivalents). The structure is the contract; the syntax follows the detected profile.
Template 1 — Direct messaging (A sends to B, verify B receives on multiple surfaces, verify C does not):
// tests/recon/mutation-send-dm-propagates-to-recipient-inbox-and-badge.spec.ts
import { test, expect } from '@playwright/test';
import { loginAs } from './_helpers';
test('A sends DM to B: B inbox + badge update, C unaffected', async ({ browser }) => {
const [aCtx, bCtx, cCtx] = await Promise.all([
browser.newContext(), browser.newContext(), browser.newContext(),
]);
const [a, b, c] = await Promise.all([aCtx.newPage(), bCtx.newPage(), cCtx.newPage()]);
await Promise.all([loginAs(a, 'alice'), loginAs(b, 'bob'), loginAs(c, 'carol')]);
// baseline
await b.goto('/messages');
const bInboxBefore = await b.getByTestId('inbox-list').locator('[data-testid="thread-row"]').count();
await b.goto('/');
const bBadgeBefore = parseInt(await b.getByTestId('unread-badge').textContent() ?? '0', 10);
await c.goto('/messages');
const cInboxBefore = await c.getByTestId('inbox-list').locator('[data-testid="thread-row"]').count();
// trigger
const body = 'recon test ' + Date.now();
await a.goto('/messages/new');
await a.getByLabel('To').fill('bob');
await a.getByRole('option', { name: /^bob/i }).click();
await a.getByLabel('Message').fill(body);
await a.getByRole('button', { name: /send/i }).click();
await expect(a.getByText(/sent|delivered/i)).toBeVisible();
// verify-direct
await expect(a.getByText(body).last()).toBeVisible();
// verify-propagated — B's inbox, badge, and message content
await b.goto('/messages');
await expect(b.getByTestId('inbox-list').locator('[data-testid="thread-row"]'))
.toHaveCount(bInboxBefore + 1);
await b.goto('/');
await expect(b.getByTestId('unread-badge')).toHaveText(String(bBadgeBefore + 1));
await b.goto('/messages');
await b.getByText(body).first().click();
await expect(b.getByText(body)).toBeVisible();
// verify-non-effects — C sees nothing
await c.goto('/messages');
await expect(c.getByTestId('inbox-list').locator('[data-testid="thread-row"]'))
.toHaveCount(cInboxBefore);
await c.goto('/');
await expect(c.getByText(body)).toHaveCount(0);
// cleanup — B reads to clear unread
await b.goto('/messages');
await b.getByText(body).first().click();
});
Template 2 — Role/permission change (admin promotes bob, verify admin count increments, bob gains admin nav, carol unaffected):
// tests/recon/mutation-promote-user-propagates-to-admin-views-and-not-others.spec.ts
import { test, expect } from '@playwright/test';
import { loginAs, normalizeNumber } from './_helpers';
test('promoting bob to admin: admin count increments, bob gains admin nav, carol unaffected', async ({ browser }) => {
const [adminCtx, bobCtx, carolCtx] = await Promise.all([
browser.newContext(), browser.newContext(), browser.newContext(),
]);
const [admin, bob, carol] = await Promise.all([
adminCtx.newPage(), bobCtx.newPage(), carolCtx.newPage(),
]);
await Promise.all([loginAs(admin, 'admin'), loginAs(bob, 'bob'), loginAs(carol, 'carol')]);
// baseline
await admin.goto('/');
const adminCountBefore = normalizeNumber(await admin.getByTestId('admin-count').textContent() ?? '');
await bob.goto('/');
await expect(bob.getByRole('link', { name: /admin/i })).toHaveCount(0);
await carol.goto('/');
const carolAdminLinkBefore = await carol.getByRole('link', { name: /admin/i }).count();
// trigger
await admin.goto('/perms');
await admin.getByRole('row', { name: /bob/i })
.getByRole('button', { name: /change role/i }).click();
await admin.getByRole('option', { name: 'Admin' }).click();
await expect(admin.getByText(/saved|updated/i)).toBeVisible();
// verify-direct
await expect(admin.getByRole('row', { name: /bob/i }).getByText(/admin/i)).toBeVisible();
// verify-propagated
await admin.goto('/');
await expect(admin.getByTestId('admin-count')).toHaveText(String(adminCountBefore + 1));
await bob.reload();
await expect(bob.getByRole('link', { name: /admin/i })).toBeVisible();
// verify-non-effects
await carol.reload();
await expect(carol.getByRole('link', { name: /admin/i })).toHaveCount(carolAdminLinkBefore);
// cleanup
await admin.goto('/perms');
await admin.getByRole('row', { name: /bob/i })
.getByRole('button', { name: /change role/i }).click();
await admin.getByRole('option', { name: 'Member' }).click();
});
One test file per (action, scenario) pair — never bundled. Names encode actors and surfaces:
mutation-send-dm-propagates-to-recipient-inbox-and-badgemutation-promote-user-propagates-to-admin-views-and-not-othersmutation-delete-post-propagates-to-feed-and-author-profile-and-not-searchFile extensions follow the convention detected in Phase 1.6.
Recon authors a separate test file per scenario so the worker pool runs them in parallel, and so failure reports name exactly which actor-surface linkage broke.
Real-time channels — WebSocket, Socket.IO, Server-Sent Events, GraphQL subscriptions, Phoenix Channels, SignalR, Firestore listeners, Supabase realtime, anything that pushes updates to the client without a page reload — are the highest-failure-rate part of any modern webapp. A mutation that broadcasts over a channel has TWO propagation paths to verify:
Both can fail independently. Recon must test both.
Real-time propagation bugs are co-priority with HTTP mutation propagation. Author Phase 4C tests alongside 4B, before Phase 4A consistency tests. If the app has no real-time channels (per Phase 1.7 and Phase 3.5), skip this phase.
Extend the linkage heuristic from Phase 4B with a real-time variant:
subscribed-page — every route that opens a connection to the channel.emit(, .broadcast(, .to(...).emit(, socket.send(yield/write patterns in stream handlerspubsub.publish(broadcast(, push(Clients.All.SendAsync(, Clients.Group((mutation page, emitted event, [subscribed pages that should react])The result is a real-time linkage map distinct from the HTTP linkage map.
Each real-time propagation test verifies both the wire and the UI in a single test, because they're two faces of the same propagation:
Pattern: subscribe → baseline → mutate → assert wire message → assert UI update without reload → cleanup
- Open the app in a browser context, subscribed via the channel
- Record baseline UI state on every subscribed page
- Open a sniffer connection to the channel directly (using a CLI client or Playwright's
page.on('websocket'))- Trigger the mutation in a separate browser context (or via API call)
- Assert the channel emitted the expected event with the expected payload
- Switch to the subscribed browser context — assert UI updated without
page.gotoor reload- Cleanup: revert the mutation
The "without reload" part is critical — that's what distinguishes real-time propagation from HTTP propagation. If the test passes only when you reload the page, it's an HTTP test, not a real-time test, and it's hiding a real bug.
Recon adapts templates to the detected channel and the detected language from Phase 1.6. Only the template matching the detected profile is used.
WebSocket / Socket.IO — TypeScript template:
// tests/recon/realtime-role-change-broadcasts-to-dashboard.spec.ts
import { test, expect, BrowserContext } from '@playwright/test';
import { loginAs, normalizeNumber } from './_helpers';
test('role change broadcasts user.role.changed: watcher updates without reload, uninvolved user unaffected', async ({ browser }) => {
// Three contexts: one watches, one mutates, one should NOT receive the broadcast
const watcherCtx = await browser.newContext();
const mutatorCtx = await browser.newContext();
const uninvolvedCtx = await browser.newContext();
const watcher = await watcherCtx.newPage();
const mutator = await mutatorCtx.newPage();
const uninvolved = await uninvolvedCtx.newPage();
await loginAs(watcher, 'admin');
await loginAs(mutator, 'admin');
await loginAs(uninvolved, 'carol'); // different tenant / unprivileged user
// 1. Sniff WebSockets on both watcher and uninvolved
const wsMessages: any[] = [];
watcher.on('websocket', (ws) => {
ws.on('framereceived', (frame) => {
try { wsMessages.push(JSON.parse(frame.payload as string)); } catch {}
});
});
const uninvolvedWsMessages: any[] = [];
uninvolved.on('websocket', (ws) => {
ws.on('framereceived', (frame) => {
try { uninvolvedWsMessages.push(JSON.parse(frame.payload as string)); } catch {}
});
});
// 2. Watcher and uninvolved subscribe by visiting their dashboards, baseline counts
await watcher.goto('/');
const baseline = normalizeNumber(await watcher.getByTestId('dashboard-admin-count').textContent() ?? '');
await uninvolved.goto('/');
const uninvolvedBaseline = await uninvolved.getByTestId('dashboard-admin-count').textContent() ?? '';
// 3. Mutator promotes a user via /perms
await mutator.goto('/perms');
const testUser = mutator.getByRole('row', { name: /recon\.test\.user@example\.com/i });
await testUser.getByRole('button', { name: /change role/i }).click();
await mutator.getByRole('option', { name: 'Admin' }).click();
await expect(mutator.getByText(/saved|updated/i)).toBeVisible();
// 4. Wire assertion — watcher's socket received the broadcast
await expect.poll(() => wsMessages.find((m) => m?.event === 'user.role.changed'),
{ timeout: 5000 }).toMatchObject({
event: 'user.role.changed',
payload: { newRole: 'admin' },
});
// 5. UI assertion — watcher's dashboard updated WITHOUT reload
// Note: NO watcher.goto() and NO watcher.reload() here
await expect.poll(
async () => normalizeNumber(await watcher.getByTestId('dashboard-admin-count').textContent() ?? ''),
{ timeout: 5000 },
).toBe(baseline + 1);
// 6. Non-effect assertion — uninvolved user did NOT receive the broadcast
await new Promise((r) => setTimeout(r, 2000)); // grace period
expect(uninvolvedWsMessages.find((m) => m?.event === 'user.role.changed'),
'uninvolved user should NOT receive user.role.changed').toBeUndefined();
await expect(uninvolved.getByTestId('dashboard-admin-count')).toHaveText(uninvolvedBaseline);
// 7. Cleanup
await mutator.goto('/perms');
await testUser.getByRole('button', { name: /change role/i }).click();
await mutator.getByRole('option', { name: 'Member' }).click();
});
SSE — Python template:
# tests/recon/test_realtime_notification_appears_without_reload.py
import re, json, threading
import httpx
from playwright.sync_api import Page, expect, BrowserContext
from ._helpers import login_as
def test_new_notification_streams_via_sse_and_appears_in_navbar_without_reload(page: Page, context: BrowserContext):
login_as(page, "admin")
page.goto("/")
# 1. Open a parallel SSE sniffer (auth cookie shared from browser context)
cookies = context.cookies()
cookie_header = "; ".join(f"{c['name']}={c['value']}" for c in cookies)
sse_messages = []
def sniff():
with httpx.stream("GET", "http://localhost:8000/api/notifications/stream",
headers={"Accept": "text/event-stream", "Cookie": cookie_header},
timeout=10) as r:
for line in r.iter_lines():
if line.startswith("data:"):
sse_messages.append(json.loads(line[5:].strip()))
if len(sse_messages) >= 1:
break
sniffer = threading.Thread(target=sniff, daemon=True)
sniffer.start()
baseline = int(page.get_by_test_id("notif-badge").text_content() or "0")
# 2. Trigger a notification via API
httpx.post("http://localhost:8000/api/notifications",
json={"to": "admin", "text": "recon test notification"},
cookies={c["name"]: c["value"] for c in cookies}).raise_for_status()
# 3. Wire assertion — SSE delivered the message
sniffer.join(timeout=5)
assert any(m.get("type") == "notification.new" for m in sse_messages), \
"SSE channel did not deliver notification.new"
# 4. UI assertion — badge updated without reload
expect(page.get_by_test_id("notif-badge")).to_have_text(str(baseline + 1), timeout=5000)
For other channel types (Phoenix Channels, SignalR, GraphQL subscriptions, managed services like Pusher/Ably), recon adapts the same dual-verification structure using the appropriate sniffer:
urql/Apollo client opens the subscription directlyphoenix.js socket joins the channelHubConnection listensRecon never mocks the channel — that defeats the purpose. If the channel requires special test infrastructure (e.g., a Redis pub/sub backend), recon notes the dependency and uses the real one.
For every detected channel, recon ALSO authors a reconnection test, because the most common real-time bugs hide here:
test('dashboard recovers and re-syncs after socket disconnect', async ({ page, context }) => {
await loginAs(page, 'admin');
await page.goto('/');
const baseline = normalizeNumber(await page.getByTestId('dashboard-admin-count').textContent() ?? '');
// Force-close all websockets on the page
await page.evaluate(() => {
// @ts-expect-error — accessing global socket reference if exposed; otherwise simulate offline
});
await context.setOffline(true);
await page.waitForTimeout(2000);
await context.setOffline(false);
// While offline, mutate via API directly
// ... mutation ...
// Assert the page eventually re-syncs (either via reconnect+catchup or a refetch)
await expect.poll(
async () => normalizeNumber(await page.getByTestId('dashboard-admin-count').textContent() ?? ''),
{ timeout: 10_000 },
).toBe(baseline + 1);
});
If reconnect tests fail, the app is silently stale for any user whose connection blips — which on mobile is most users, most of the time. These are not edge cases.
This phase covers all remaining checks not covered by Phase 4B (mutation propagation) and Phase 4C (real-time propagation). Recon authors test files in the language and convention detected in Phase 1.6, in the recon test directory.
Before authoring any test, scan the recon test directory for an existing test covering the same behavior. If found, run it instead of rewriting it.
Author one consistency test file per route. Each test file:
getByTestId, getByRole, getByLabel)Reference references/checks.md for detailed cross-check rules.
For pages with forms, author parameterised test files over the attack matrix from references/form-attacks.md. One test per (field, attack) pair so failures are surgical.
Author a DB helper in the recon test directory:
_helpers/db.ts exporting a typed getDb() function_helpers/db.py exporting get_db()The helper reads connection details from environment variables set by the user (or the per-PR Docker env) and supports postgres/mysql/sqlite.
Run all tests (from Phases 4B, 4C, and 4A) in one command using the runner from Phase 1.6:
npx playwright test <recon-dir> --reporter=json > /tmp/recon-results.jsonpytest <recon-dir> --json-report --json-report-file=/tmp/recon-results.jsonmvn test -Dtest='Recon*' -Dsurefire.reportFormat=jsondotnet test --logger "json;LogFileName=/tmp/recon-results.json"Parse the JSON to populate findings for the Confidence Letter.
Fall back to MCP browser_* tools ONLY when a test fails in a way that needs interactive diagnosis (e.g., understanding why a selector didn't match, or what a page actually rendered). This is the exception, not the rule.
After running all tests, print status:
Test execution complete:
Phase 4B (mutation propagation): 12 tests — 10 passed, 2 failed
Phase 4C (real-time propagation): 5 tests — 4 passed, 1 failed
Phase 4A (consistency + forms): 38 tests — 35 passed, 2 failed, 1 skipped
MCP follow-up: 3 failures investigated interactively
After all pages are crawled and all tests are executed, produce the Confidence Letter. This is the primary output — structured markdown that both humans and forge can read.
Score each category out of 10 based on pass/fail ratio and severity weighting:
Categories:
Overall Confidence = average of all category scores * 10 (0-100 scale)
Write to recon-confidence-letter-YYYY-MM-DD.md in the project root:
# Recon Confidence Letter
**Date**: YYYY-MM-DD HH:MM
**Target**: http://localhost:5173
**Stack**: [detected stack]
**Overall Confidence: [score]/100**
## Category Scores
| Category | Score | Pass | Fail | Warn |
|----------|-------|------|------|------|
| Mutation Propagation | X/10 | N | N | N |
| Real-Time Propagation | X/10 | N | N | N |
| Security | X/10 | N | N | N |
| Data Consistency | X/10 | N | N | N |
| UI Cleanliness | X/10 | N | N | N |
| Accessibility | X/10 | N | N | N |
| Functional Correctness | X/10 | N | N | N |
| DB Integrity | X/10 | N | N | N |
| API Contracts | X/10 | N | N | N |
## Findings
### [SEVERITY] Short title (propagation findings)
- **Category**: Mutation Propagation or Real-Time Propagation
- **Actors**: alice (sender), bob (recipient), carol (uninvolved control)
- **Action**: POST /api/endpoint { relevant payload }
- **Direct effect**: ✓ / ✗ actor A sees their own action
- **Propagated effect**: ✓ / ✗ description per affected actor and surface
- **Non-effect**: ✓ / ✗ uninvolved actors correctly unaffected
- **Wire**: (real-time only) socket emitted event with payload ✓ / ✗
- **UI**: (real-time only) page updated without reload ✓ / ✗
- **Root cause hint**: what to fix and where
### [SEVERITY] Short title (non-propagation findings)
- **Category**: category name
- **Page**: /route (or "Global" for cross-cutting issues)
- **UI value**: exact value (or N/A)
- **API value**: exact value (or N/A)
- **DB value**: exact value (or N/A)
- **Evidence**: screenshot, endpoint, query used
- **Fix hint**: what to fix and where
...repeat for every finding...
## Actions Required
1. [CRITICAL] description — file hint
2. [CRITICAL] description — file hint
3. [FAIL] description — file hint
4. [WARN] description — file hint
Also write recon-report-YYYY-MM-DD.md with full evidence: all screenshots, raw API responses, raw DB query results, console logs, test runner JSON output, and script outputs. The confidence letter is the summary; the report is the evidence.
Print to the conversation:
Recon complete. {N} pages crawled. {M} issues found.
Overall Confidence: {score}/100
REAL-TIME PROPAGATION FAILURES ({count}) — fix these first, real-time bugs ship silently:
/perms →[user.role.changed]→ / UI did not update without reload
notification.new channel did not emit on POST /notifications
HTTP PROPAGATION FAILURES ({count}) — fix these next, they're integrity bugs:
/perms → /dashboard admin count did not increment after role change
CRITICAL ({count}):
{route} {description}
FAIL ({count}):
{route} {description}
WARNING ({count}):
{route} {description}
INFO ({count}):
{route} {description}
Confidence Letter written to: recon-confidence-letter-YYYY-MM-DD.md
Full report written to: recon-report-YYYY-MM-DD.md
To fix all issues: /forge read the confidence letter and gain 100%
STOP — If you reached this phase, you MUST be in per-PR mode. That means:
- You skipped Phases 1-5 (they don't apply here)
- You have NOT scanned localhost ports or started
pnpm dev- You are about to use
recon-env.shto spin up isolated Docker environments- If any of the above is wrong, go back to the Mode Routing table at the top
Trigger phrases:
--per-pr, "per PR", "each PR", "all PRs", "parallel", "isolated", "each branch", "test each", "parellely", "each isolated"
When the user runs /forge:recon --per-pr or asks to "recon all open PRs" or "test each PR branch," recon audits every open PR branch in parallel using isolated Docker Compose environments managed by the recon-env.sh orchestration script.
Running recon against main only tests merged code. Open PRs contain unmerged changes that may introduce regressions, security issues, or data inconsistencies. Per-PR recon catches these before merge by checking out each branch, building it in an isolated Docker environment, and auditing the live result.
Each PR gets a fully isolated environment:
PR #42 ─── git archive ─── docker compose -p recon-42 ─── UI :5180, API :8010, DB :5442
PR #45 ─── git archive ─── docker compose -p recon-45 ─── UI :5181, API :8011, DB :5443
PR #48 ─── git archive ─── docker compose -p recon-48 ─── UI :5182, API :8012, DB :5444
Branch code is extracted via git archive into /tmp/recon-envs/recon-<id>/src/ — no git worktrees needed. Docker Compose's -p (project name) flag namespaces ALL resources — containers, networks, volumes — so environments are completely isolated from each other.
recon-env.sh scriptAll environment lifecycle operations use ${CLAUDE_SKILL_DIR}/scripts/recon-env.sh:
# Spin up an isolated environment for a PR
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" up <pr_number> <branch> <ui_port> <api_port> <db_port>
# Check if it's running
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" status <pr_number>
# Tear it down
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" down <pr_number>
# List all active environments
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" list
# Allocate port ranges for N PRs
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" ports <count>
# Destroy everything
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" nuke
The script handles: branch code extraction via git archive, Docker Compose startup with port injection via RECON_UI_PORT/RECON_API_PORT/RECON_DB_PORT environment variables, health polling, metadata tracking, and full cleanup.
List all open PRs:
gh pr list --state open --json number,title,headRefName --limit 20
Present the list and wait for user confirmation. They may exclude PRs or select specific ones.
The project needs a docker-compose.yml (or compose.yml) that uses environment variables for port binding. Check if one exists:
${RECON_UI_PORT:-5173} syntax for default-with-override.RECON_UI_PORT, RECON_API_PORT, and RECON_DB_PORT environment variables.Example compose file structure for a typical stack:
services:
frontend:
build: ./frontend
ports:
- "${RECON_UI_PORT:-5173}:5173"
backend:
build: ./backend
ports:
- "${RECON_API_PORT:-8000}:8000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/app
db:
image: postgres:16
ports:
- "${RECON_DB_PORT:-5432}:5432"
environment:
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=app
Use the script to allocate ports and spin up each PR:
# See what ports will be assigned
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" ports 3
# Spin up each PR environment
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" up 42 feature/user-auth 5180 8010 5442
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" up 45 fix/pricing-display 5181 8011 5443
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" up 48 feature/dashboard-v2 5182 8012 5444
The script will:
git archive into /tmp/recon-envs/recon-<id>/src/RECON_UI_PORT, RECON_API_PORT, RECON_DB_PORT injectedExit codes:
0 — fully healthy, ready for recon1 — error (build failed, git error)2 — code extracted but no compose file found (Claude should generate one)3 — running but not all services are healthyIf exit code is 2, generate a docker-compose.yml in the extracted directory, then re-run the up command.
If a PR fails to start, log the failure and continue with the others.
Dispatch parallel agents — one per PR. Each agent runs the full recon workflow (Phase 2 through Phase 5) against its allocated URLs:
Agent 1: recon against http://localhost:5180 (API: http://localhost:8010, DB: localhost:5442) — PR #42
Agent 2: recon against http://localhost:5181 (API: http://localhost:8011, DB: localhost:5443) — PR #45
Agent 3: recon against http://localhost:5182 (API: http://localhost:8012, DB: localhost:5444) — PR #48
Each agent:
recon-confidence-letter-YYYY-MM-DD-pr-{number}.mdAfter all agents complete, clean up every environment:
# Tear down each PR individually
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" down 42
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" down 45
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" down 48
# Or nuke everything at once
bash "${CLAUDE_SKILL_DIR}/scripts/recon-env.sh" nuke
The script stops Docker Compose (with volume removal), kills any orphan processes on the allocated ports, and removes the extracted source directory.
After all agents finish, produce a combined summary:
Per-PR Recon Complete.
PR #42 (feature/user-auth): Confidence: 85/100 — 3 issues (1 critical, 2 warnings)
PR #45 (fix/pricing-display): Confidence: 92/100 — 1 issue (1 warning)
PR #48 (feature/dashboard-v2): BUILD FAILED — docker compose error (missing Dockerfile)
Individual confidence letters:
recon-confidence-letter-2026-03-15-pr-42.md
recon-confidence-letter-2026-03-15-pr-45.md
To fix issues in a specific PR:
git checkout feature/user-auth && /forge read the confidence letter and gain 100%
When the user wants to fix a specific PR's issues:
git checkout feature/user-auth/forge read the confidence letter and gain 100% — forge reads recon-confidence-letter-*-pr-42.md/forge:recon --per-pr or recon that single branch to verifyRECON_UI_PORT, RECON_API_PORT, RECON_DB_PORT environment variables. If it doesn't, Claude will modify or generate one.test_*.py files should never end up with a stray file in a different convention from recon.page.reload(), delete the reload and let it fail. The failure is the bug. Real-time means real-time.