When this command is invoked, YOU (Claude) must execute these steps immediately:This is NOT documentation - these are COMMANDS to execute right now.Use TodoWrite to track progress through multi-phase workflows.
🚨 EXECUTION WORKFLOW
Phase 1: Test Steps
Action Steps:
Review the reference documentation below and execute the detailed steps.
Phase 1: [Primary Test Objective]
Action Steps:Objective: [What this phase validates]
Review the reference documentation below and execute the detailed steps.
Phase 5: Step 1: [Specific Action]
Action Steps:Navigate: http://localhost:3002/[specific-path]
Expected: [What should happen]
Evidence: Screenshot → docs/[test-name]-step1-[status].pngAPI Check: Monitor for [specific API calls]
Console Check: No errors matching: ['TypeError', 'undefined', 'failed to fetch', '401', '500']
🔴 VALIDATION CHECKPOINT 1:
Screenshot captured and saved
API calls logged (present/absent as expected)
Console errors checked and logged
Priority assessment: 🚨 CRITICAL / ⚠️ HIGH / 📝 MEDIUM
🚨 MANDATORY: If CRITICAL found, STOP and implement fixes before proceeding
Phase 6: Step 2: [Specific Action]
Action Steps:Action: [Specific user action to perform]
Expected: [What should happen]
Evidence: Screenshot → docs/[test-name]-step2-[status].pngData Validation: Verify "[test_identifier]" appears in UI
Backend Logs: Check Flask logs for [expected activity]
🔴 VALIDATION CHECKPOINT 2:
Action completed successfully
Test data visible in UI (no hardcoded values)
Backend logs show expected activity
Priority assessment: 🚨 CRITICAL / ⚠️ HIGH / 📝 MEDIUM
🚨 MANDATORY: If CRITICAL found, STOP and implement fixes before proceeding
Phase 7: Step 2.5: Service Integration Validation - CRITICAL LAYER
Action Steps:Action: Verify service layers receive and use user data correctly
Expected: AI/content generation services incorporate user's specific data
Evidence: Screenshot of service output (content, emails, AI responses)
Service Validation: Check that generated content references user data, not placeholders
Integration Logs: Monitor service API calls and data payloads
🔴 VALIDATION CHECKPOINT 2.5:
Service output incorporates user test data
No hardcoded/placeholder content in service responses
Service APIs receive user context correctly
Priority assessment: 🚨 CRITICAL / ⚠️ HIGH / 📝 MEDIUM
🚨 MANDATORY: Service integration failures are CRITICAL - STOP and fix immediately
🚀 GREEN PHASE IMPLEMENTATION (Update With Fix Code)
Action Steps:
// Implementation code will go here after issues identified
Phase 9: Step 1: Load Landing Page (User with Campaigns)
Action Steps:Expected: Page should check for existing campaigns and show appropriate UI
Evidence:
Screenshot of page content
Network logs showing API call
Console logs for any errors
Priority Assessment:
IF no API call made → 🚨 CRITICAL - This is core integration issue
IF API call fails → ⚠️ HIGH - Error handling problem
IF slow loading → 📝 MEDIUM - Performance issue
Phase 10: Step 2: Load Landing Page (User without Campaigns)
Action Steps:Expected: Page should show "Create Your First Campaign"
Evidence:
Screenshot of page content
Confirmation of appropriate messaging
Priority Assessment:
IF same content shown → 🚨 CRITICAL - Not checking user state
IF confusing messaging → 📝 MEDIUM - UX improvement
🚨 MANDATORY STOP CONDITIONS:
5. Landing page doesn't call campaigns API
6. Same content shown regardless of user state
7. User cannot proceed to create or access campaigns
📊 RESULTS EVALUATION:
Based on findings, classify each issue and determine immediate actions needed.
📋 REFERENCE DOCUMENTATION
/generatetest - Intelligent Test Protocol Generator
Purpose: Generate execution-ready test protocols for both MCP and browser testing with automatic detection
Hybrid Tests: Detects both patterns → Asks user for clarification
Keywords for MCP Test Generation:
mcp, worldarchitect, campaign, create_campaign, get_campaigns_list, process_action,
get_campaign_state, firebase, firestore, gemini, api integration, server integration,
real services, production mode, mcp tools, mcp server
IF input contains MCP keywords → Generate MCP integration test
IF input contains browser keywords → Generate browser automation test
IF input contains both → Ask user: "Detected both MCP and browser elements. Generate: (1) MCP integration test, (2) Browser automation test, or (3) Hybrid test?"
IF unclear → Ask user: "Generate: (1) MCP integration test or (2) Browser automation test?"
Default Behavior (Focused & Execution-Ready):
Problem-First Design: Start with specific user problem to solve
Execution Commands: Include ready-to-run commands and URLs
Priority Matrix: Rank findings by actual user impact
Validation Checkpoints: Numbered steps with mandatory stop conditions
Results Integration: Template sections for actual findings
Learning Capture: Document expectations vs reality
Test Generation Protocol
1. Problem Definition
What specific issue is being tested?
What is the user impact if this issue exists?
What would "working correctly" look like?
CRITICAL ADDITION: What service integration boundaries need validation?
CRITICAL ADDITION: What user journey layers could fail independently?
2. Success Criteria Definition
Primary Goal: Main problem that must be solved
Secondary Goals: Nice-to-have improvements
Failure Conditions: What constitutes a test failure requiring immediate action
3. Priority Matrix Setup
🚨 CRITICAL: Blocks core user functionality
⚠️ HIGH: Significant user experience degradation
📝 MEDIUM: Minor UX issues or cosmetic problems
ℹ️ LOW: Documentation or edge case issues
4. Evidence Collection Requirements
Screenshots: Specific pages/states to capture
API Logs: Which endpoints to monitor
Network Traffic: Expected vs actual requests
Console Logs: Error messages to watch for
🚨 MANDATORY STOP CONDITIONS PROTOCOL
STOP testing immediately and implement fixes if:
Any 🚨 CRITICAL condition found that blocks core user functionality
Data loss or corruption risk detected
User cannot complete primary workflow
System completely breaks or becomes unusable
CRITICAL ADDITION: Hardcoded content appears instead of user data (service integration failure)
CRITICAL ADDITION: User sees placeholder/demo content in final experience
CRITICAL ADDITION: Any service layer fails to receive user context
Critical Rule: Never continue testing with unresolved CRITICAL issues
🚨 PRIORITY-BASED ACTION PROTOCOL
For each finding, apply immediate action:
🚨 CRITICAL → STOP testing, implement fix, verify, then resume
⚠️ HIGH → Document thoroughly, add to immediate sprint backlog
📝 MEDIUM → Note for future improvement, continue testing
ℹ️ LOW → Brief documentation, no action blocking
Action Decision Matrix:
IF finding is 🚨 CRITICAL → HALT workflow, fix immediately, verify, resume
IF finding is ⚠️ HIGH → Document with timeline, continue testing
IF finding is 📝 MEDIUM → Log for backlog, continue testing
IF finding is ℹ️ LOW → Note briefly, continue testing
🔄 EXPECTATION VS REALITY PROTOCOL
When test findings don't match expectations:
Assess actual user impact (not expectation accuracy)
Re-classify findings by real-world priority using 🚨/⚠️/📝 system
If 🚨 CRITICAL issues found → Focus on those immediately
Update test expectations for future runs after fixes
NEVER dismiss real user problems because they weren't expected
Critical Rule: Real user problems = Priority, regardless of test assumptions
Example: Landing page not checking campaigns = API integration failure (not "minor UX")
📝 EVIDENCE DOCUMENTATION PROTOCOL
For each test step, MANDATORY collection:
Screenshots: Saved to docs/[test-name]-[step]-[priority].png
Network logs: Specific API calls with status codes and timing
Console logs: Error messages, warnings, and API success logs
Priority assessment: Applied using 🚨/⚠️/📝 criteria for each finding
Action taken: Document what was done for CRITICAL issues
Evidence naming convention:docs/[test-name]-[finding-priority]-[description].pngLog format: Include timestamps, request/response details, error traces
✅ TEST COMPLETION VALIDATION PROTOCOL
Test is NOT complete until:
All 🚨 CRITICAL issues resolved and verified with evidence
All ⚠️ HIGH issues documented with fix timeline and owner
Evidence collected and properly named for all findings
Next actions clearly defined with priorities and assignments
Verification screenshots showing fixes work end-to-end
No test can be marked "complete" with unresolved CRITICAL issues
🔍 CONSOLE ERROR MONITORING PROTOCOL
Automated error detection for each test step:
Setup console monitoring: Capture errors/warnings with timestamps
Define test-specific error patterns: Authentication, API, navigation, etc.
Critical error patterns: TypeError, undefined, failed to fetch, 401, 500, CORS
Validation function: Check for test-specific critical errors vs acceptable warnings
Clean console requirement: Zero critical errors, minimal non-critical warnings
{
"campaign_title": "[TEST_NAME] - $(date +%Y%m%d_%H%M)",
"character_name": "[Unique Name] the [Descriptor]",
"test_identifier": "[Unique value to track in UI]"
}
Console Monitoring:
// Execute in browser console before testing
window.testErrorLog = [];
console.error = function(...args) {
window.testErrorLog.push({type: 'error', timestamp: new Date().toISOString(), message: args.join(' ')});
};
📊 RESULTS DOCUMENTATION (Fill During Execution)
🚨 CRITICAL Issues Found (Update After Testing)
Issue 1: [Description]
Evidence: [Screenshot/log reference]
Impact: [How this blocks core functionality]
Action: [Immediate fix required]
✅ Working Correctly (Update After Testing)
Functionality: [What works as expected]
Evidence: [Screenshot/log reference]
Console: [Clean or acceptable warnings]
🎯 KEY LEARNINGS (Update After Testing)
Expected vs Reality:
Expected: [Original assumptions]
Reality: [What was actually found]
Learning: [Insights for future tests]
🚨 TEST EXECUTION FAILURE PROTOCOL
If ANY validation checkpoint fails:
IMMEDIATELY STOP the test execution
REPORT DEVIATION with exact details
DO NOT CONTINUE without explicit approval
PRIORITY ASSESSMENT: Real user impact overrides expectations
✅ COMPLETION CRITERIA
All CRITICAL issues resolved with evidence
HIGH issues documented with timeline
Test data appears correctly in final UI
Clean console (zero critical errors)
Learning section completed with insights
Implementation Examples
Example: Landing Page API Integration Test
🎯 PRIMARY PROBLEM: Landing page may not check user's existing campaigns
📋 SUCCESS CRITERIA:
Primary: Landing page shows different content based on user's campaign state
Secondary: Page loads quickly with good UX
🚨 CRITICAL FAILURE CONDITIONS:
Landing page always shows same content regardless of user state
No API calls made to check user campaigns
User cannot access their existing campaigns
⚠️ HIGH PRIORITY CONDITIONS:
Slow API response times (>2s)
API errors not handled gracefully
Confusing navigation flow
📝 EVIDENCE COLLECTION:
Screenshot: landing_page_with_campaigns.png
Screenshot: landing_page_no_campaigns.png
API Logs: Monitor GET /api/campaigns calls
Network: Verify campaigns API is called on page load
Console: Check for API errors or warnings
🔄 TEST EXECUTION:
Command Implementation
When user runs /generatetest landing_page_integration "Landing Page" "Dynamic content based on user campaigns":
Generate Focused Test: Create execution-ready test in roadmap/tests/
Include Project Specifics: URLs, commands, log paths, auth methods
Set Validation Checkpoints: Numbered steps with mandatory stops
Create Results Templates: Sections for actual findings integration
Add Learning Capture: Template for expectation vs reality analysis
Provide Implementation Guide: Code snippet placeholders for fixes
Key Improvements Over Previous Version
✅ Execution-Ready by Default:
Specific commands to run (curl health checks, log monitoring)
Project URLs and paths (localhost:3002, flask-server.log)
Ready-to-execute console monitoring setup
Concrete evidence naming conventions
✅ Results Integration:
Template sections that get filled during execution
Learning capture for expectation vs reality
Implementation code placeholders for fixes
Clear distinction between template and results sections
✅ Focused Structure:
RED/GREEN phase organization like successful original
Numbered validation checkpoints with mandatory stops
Concise but comprehensive protocols
Balance of methodology and actionability
✅ Priority-First Design:
User impact assessment built into every step
Never dismiss real user problems as "minor"
CRITICAL issues trigger immediate stop and fix
Learning from previous testing failures integrated
This creates tests that are both methodologically sound AND immediately executable with real results integration.