When this command is invoked, YOU (Claude) must execute these steps immediately:This is NOT documentation - these are COMMANDS to execute right now.Use TodoWrite to track progress through multi-phase workflows.
🚨 EXECUTION WORKFLOW
Phase 1: Mandatory First Step
Action Steps:
Read the Entire Suite First: Before planning, checklist creation, or any execution, explicitly read every test specification in the testing_llm/ directory to internalize scope, dependencies, and evidence requirements.
Phase 2: Report Integrity Checklist (MANDATORY)
Action Steps:
Before submitting final report, verify:
Every claimed evidence file verified with ls -la command
No references to non-existent files or screenshots
Exit status tracked for all commands
Final SUCCESS/FAILURE aligned with actual exit codes
No contradictions between claims and evidence
All TodoWrite items have corresponding verified evidence
Phase 3: Pre-Execution Requirements
Action Steps:CRITICAL: Before starting ANY test specification, ALWAYS follow this systematic protocol:
Read Specification Twice: Complete understanding before execution
Extract ALL Requirements: Convert every requirement to TodoWrite checklist
Identify Evidence Needs: Document what proof is needed for each requirement
Create Validation Plan: Map each requirement to specific validation method
Execute Systematically: Complete each requirement with evidence collection
Success Declaration: Only declare success with complete evidence portfolio
Read ALL test files in the specified directory before any execution
Catalog ALL test cases across all files in TodoWrite checklist
Identify test dependencies and execution order requirements
Verify test coverage spans all requested functionality
Document test matrix showing all scenarios to be validated
⚠️ GATE: Cannot proceed without complete test inventory from ALL files
Phase 5: Step 2: Comprehensive Test Planning
Action Steps:
Extract requirements from EACH test file into unified checklist
Map test interdependencies (authentication → campaign creation, etc.)
Plan execution sequence respecting prerequisites
Estimate total test duration for all cases combined
Document evidence collection needs for complete matrix
⚠️ GATE: Cannot start testing without unified execution plan
Phase 6: Step 3: Sequential Test Execution
Action Steps:
Execute ALL test files in logical dependency order
Complete each test matrix before moving to next file
Collect evidence for EVERY test case across all files
Track completion status for entire directory scope
Validate success criteria for combined test suite
⚠️ GATE: Cannot declare success without ALL files tested
Phase 7: Step 1: Systematic Requirement Analysis
Action Steps:
Read test specification completely (minimum twice)
Extract ALL requirements into explicit TodoWrite checklist items
Identify success criteria AND failure conditions for each requirement
Document evidence collection plan for each requirement
Create systematic validation approach before any execution
Phase 8: Step 2: Test Environment Setup
Action Steps:
Review run_local_server.sh to understand how the local environment should be launched
Detect whether the local server stack started by run_local_server.sh is already running
If servers are not running, execute run_local_server.sh and wait for successful startup
Ensure real authentication is configured (no test mode)
Validate Playwright MCP availability for browser automation
Confirm network connectivity for real API calls
Determine the current repository name (git rev-parse --show-toplevel | xargs basename) and active branch (git rev-parse --abbrev-ref HEAD) to construct result paths under /tmp/<repo_name>/<branch_name>/
Phase 9: Step 2.5: Result Output Directory Standard
Action Steps:
Create (if necessary) the directory /tmp/<repo_name>/<branch_name>/
Store all test outputs, logs, screenshots, and evidence artifacts inside this directory or its subdirectories
After execution, enumerate every created file and subdirectory so the user receives a complete inventory
Explicitly communicate the absolute path to the /tmp/<repo_name>/<branch_name>/ directory and its contents in the final summary
Phase 10: Step 3: Test Execution
Action Steps:
Follow test instructions step-by-step with LLM reasoning
Use Playwright MCP for browser automation (headless mode)
Make real API calls to actual backend
Capture screenshots for evidence using proper file paths
Monitor console errors and network requests
Document findings with exact evidence references
Phase 11: Step 4: Results Analysis
Action Steps:
Assess findings against test success criteria
Classify issues as CRITICAL/HIGH/MEDIUM per test specification
Provide actionable recommendations
Generate evidence-backed conclusions
Phase 12: Execution Flow with Validation Gates
Action Steps:
1. Systematic Requirement Analysis (MANDATORY GATE)
├── Read test specification twice completely
├── Extract ALL requirements to TodoWrite checklist
├── Identify success criteria AND failure conditions
├── Document evidence needs for each requirement
├── Create systematic validation plan
└── ⚠️ GATE: Cannot proceed without complete requirements checklist
2. Environment Validation
├── Inspect `run_local_server.sh` for the expected services and health checks
├── Determine if the local server stack is already running; start it with `run_local_server.sh` if needed
├── Verify authentication configuration
├── Confirm Playwright MCP availability
├── Validate network connectivity
└── ⚠️ GATE: Cannot proceed without environment validation
3. Systematic Test Execution
├── Execute EACH TodoWrite requirement individually
├── Capture evidence for EACH requirement (screenshots, logs)
├── Test positive cases AND negative/failure cases
├── Update TodoWrite status: pending → in_progress → completed
├── Validate evidence quality before marking complete
└── ⚠️ GATE: Cannot proceed to next requirement without evidence
4. Comprehensive Results Validation
├── Verify ALL TodoWrite items marked completed with evidence
├── Cross-check findings against original specification
├── Validate that failure conditions were tested (not just success)
├── 🚨 MANDATORY: Run `ls -la /tmp/<repo_name>/<branch_name>/` to verify all claimed evidence files
├── 🚨 MANDATORY: Compare claimed evidence files against actual directory listing
├── 🚨 MANDATORY: Remove any phantom file references from report
├── Generate evidence-backed report with ONLY verified file references
├── Apply priority classification with specific evidence
├── 🚨 MANDATORY: Check exit status of all executed commands
├── 🚨 MANDATORY: Align final SUCCESS/FAILURE with actual exit codes
└── ⚠️ FINAL GATE: Success only declared with exit code 0 AND complete verified evidence portfolio
Phase 13: Command Execution Modes
Action Steps:
Review the reference documentation below and execute the detailed steps.
Phase 14: Execution Flow Selection Logic
Action Steps:
if not command_args:
execute_directory_suite("testing_llm", mode="single_agent")
elif command_args == ["verified"]:
execute_directory_suite("testing_llm", mode="dual_agent")
elif "verified" in command_args:
execute_dual_agent_mode()
spawn_testexecutor_agent()
wait_for_evidence_package()
spawn_testvalidator_agent()
cross_validate_results()
else:
execute_single_agent_mode()
follow_systematic_validation_protocol()
📋 REFERENCE DOCUMENTATION
/testllm - LLM-Driven Test Execution Command
Purpose
Execute test specifications directly as an LLM without generating intermediate scripts or files. Follow test instructions precisely with real authentication and browser automation.
Usage Patterns
# Default Directory Suite (No Arguments)
/testllm
/testllm verified
# Single-Agent Testing (Traditional)
/testllm path/to/test_file.md
/testllm path/to/test_file.md with custom user input
/testllm "natural language test description"
# Dual-Agent Verification (Enhanced Reliability)
/testllm verified path/to/test_file.md
/testllm verified path/to/test_file.md with custom input
/testllm verified "natural language test description"
Default Behavior (No Arguments Provided)
Automatic Directory Coverage: When invoked without a specific test file or natural language specification, /testllm automatically executes the full testing_llm/ directory test suite using the 🚨 DIRECTORY TESTING PROTOCOL.
Verified Mode Support: /testllm verified with no additional arguments runs the same testing_llm/ directory workflow, but with the dual-agent verification architecture for independent validation.
Extensible Overrides: Providing any explicit file path, directory, or natural language description overrides the default and targets the requested scope.
Core Principles
LLM-Native Execution: Drive tests directly as Claude, no script generation
Real Mode Only: NEVER use mock mode, test mode, or simulated authentication
Precise Following: Execute test instructions exactly as written
Browser Automation: Use Playwright MCP for real browser testing
Real Authentication: Use actual Google OAuth with real credentials
Test Execution - Execute requirements with evidence collection
Results Compilation - Generate final report with findings
Dual-Agent Mode (Enhanced Verification)
When /testllm verified is invoked:
Phase 1: TestExecutor Agent Execution
Task(
subagent_type="testexecutor",
description="Execute test specification with evidence collection",
prompt="Follow test specification methodically. Create evidence package with screenshots, logs, console output. NO success/failure judgments - only neutral documentation."
)
Phase 2: Independent Validation
Task(
subagent_type="testvalidator",
description="Independent validation of test results",
prompt="Evaluate evidence package against original test specification. Fresh context assessment - no execution bias. Provide systematic requirement-by-requirement validation."
)
Phase 3: Cross-Verification
Compare Results - TestExecutor evidence vs TestValidator assessment
Resolve Disagreements - Validator decision takes precedence in conflicts
Final Report - Combined analysis with both perspectives