Patterns for defining and executing end-to-end user journeys via browser automation at major milestones. Catches integration bugs that component tests miss. Don't use for unit testing, API testing, or debugging a single component. Don't use mid-task — use at milestone boundaries only.
From product-playbook-for-agentic-codingnpx claudepluginhub daviswhitehead/product-playbook-for-agentic-coding-plugin --plugin product-playbook-for-agentic-codingThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Component-level and unit tests pass while real user flows are broken. Test stubs (e.g., x-test-chat-stub) create false confidence by exercising rendering but not the real system pipeline. The most effective bug-finding technique in practice is testing as a real user would — end-to-end, through the actual system.
Journey tests exercise the real system:
This is fundamentally different from E2E tests that use stubs for speed/reliability. Both are valuable, but only journey tests answer: "Does this actually work for a real person?"
Structure each journey as a sequence of actions and verifications from the user's perspective:
Journey: [Name — describes the user goal]
Preconditions: [What must be true before starting]
Steps:
1. [Action]: Navigate to /path
Verify: Page loads, expected elements visible
2. [Action]: Type "message" in chat input, press Send
Verify: Message appears in chat, agent responds
3. [Action]: Click on [element]
Verify: Expected result occurs
...
Success Criteria: [What "working" looks like at the end]
Create and Edit a Recipe:
Journey: Create recipe in chat, then edit it
Preconditions: Logged-in user with an active conversation
Steps:
1. Navigate to chat
2. Send "Make me a pad thai recipe with shrimp"
3. Verify: Tool call indicator appears, then recipe card renders in chat
4. Click recipe card link
5. Verify: Recipe detail page loads with title, ingredients, instructions
6. Navigate back to chat
7. Send "Actually, make it with tofu instead of shrimp"
8. Verify: Agent updates the recipe (not creates a new one)
9. Click updated recipe card
10. Verify: Detail page shows tofu, not shrimp
Success Criteria: Single recipe exists with tofu, edit history preserved
Guest User Gating:
Journey: Guest user hits message limit
Preconditions: Fresh browser session (no auth)
Steps:
1. Navigate to home page
2. Start a conversation as guest
3. Send 5 messages (with agent responses between each)
4. Verify: Messages 1-5 all visible with agent replies
5. Attempt to send message 6
6. Verify: Gate modal appears (not on message 5, on attempt 6)
Success Criteria: User sees all 5 messages before being gated
Use Agent Browser CLI or Playwright MCP to walk through each step:
1. Start dev services: npm run deploy:development:all
2. Navigate to the starting page
3. For each step:
a. Perform the action (click, type, navigate)
b. Wait for the expected result (use testId selectors, not timeouts)
c. Take a screenshot as evidence
d. If verification fails: stop, document the failure, fix it
4. After all steps pass: compile evidence report
At major milestones, use this pattern:
"Commit the current work, then do the manual test yourself autonomously. Use the Agent Browser CLI or Playwright to simulate a real user walking through these journeys. Fix any issues you find. Only hand off to me after all journeys pass."
Each journey execution should produce:
## Journey Report: [Name]
**Date**: YYYY-MM-DD
**Result**: PASS / FAIL
| Step | Action | Expected | Actual | Status |
|------|--------|----------|--------|--------|
| 1 | Navigate to /chat | Page loads | Page loaded | PASS |
| 2 | Send message | Agent responds | Agent responded | PASS |
| 3 | Click recipe | Detail page | 404 error | FAIL |
**Screenshots**: [attached or linked]
**Failures fixed**: [description of what was wrong and how it was fixed]
When a milestone is complete, define 3-5 journeys that cover its scope:
Not every milestone needs all 5. Use judgment — cover the riskiest flows.
| Type | Speed | Pipeline | Purpose | When |
|---|---|---|---|---|
| Unit tests | Fast (~10s) | None | Logic correctness | Every commit |
| Integration tests | Medium (~2-3m) | Real DB | Service layer | Before push |
| E2E tests (stubbed) | Medium (~3-4m) | Stub | UI rendering | Before push |
| Journey tests | Slow (~5-10m) | Real | User experience | Milestones |
Journey tests are the most expensive but the most truthful. They are the final quality gate before declaring work complete.