⚡ EXECUTION INSTRUCTIONS FOR CLAUDE
When this command is invoked, YOU (Claude) must execute these steps immediately:
This is NOT documentation - these are COMMANDS to execute right now.
Use TodoWrite to track progress through multi-phase workflows.
🚨 EXECUTION WORKFLOW
Phase 1: Step 1: Test Specification Resolution
Action Steps:
Based on the command arguments, resolve to appropriate test specification:
Test Type Mapping:
integration → testing_mcp/test_create_continue_mcp.md (comprehensive integration test)
performance → Run testing_mcp/run_mcp_tests.sh performance via /testllm
unit → Run testing_mcp/run_mcp_tests.sh unit via /testllm
mock → Run testing_mcp/run_mcp_tests.sh mock via /testllm
all or no args → Run testing_mcp/run_mcp_tests.sh all via /testllm
- Specific
.md file → Direct execution of test specification via /testllm
Phase 2: Step 2: /testllm Delegation
Action Steps:
Execute the resolved test specification using /testllm with appropriate mode:
Single-Agent Mode (default):
/testllm [resolved_test_spec] [additional_args]
Dual-Agent Mode (when verified keyword present):
/testllm verified [resolved_test_spec] [additional_args]
Phase 1: Argument Analysis
Action Steps:
- Parse command arguments to determine test type and mode
- Validate test specifications exist in
testing_mcp/ directory
- Check for
verified keyword to determine single vs dual-agent mode
- Resolve target test specification based on test type
Phase 2: Environment Validation
Action Steps:
- Check MCP server availability (production mode required)
- Verify test dependencies (pytest, browser automation tools)
- Validate authentication configuration for real API testing
- Confirm network connectivity for Firebase/Gemini integration
Phase 3: /testllm Execution
Action Steps:
- Delegate to
/testllm with resolved test specification
- Apply systematic validation protocol from
/testllm framework
- Execute with TodoWrite tracking for comprehensive requirement validation
- Capture evidence (screenshots, logs, API responses) in
docs/ directory
Phase 4: Results Analysis
Action Steps:
- Process test results using
/testllm analysis framework
- Generate evidence-backed conclusions with specific file references
- Classify findings as CRITICAL/HIGH/MEDIUM per MCP test specifications
- Provide actionable recommendations for MCP architecture improvements
Phase 7: Specific Test File Execution
Action Steps:
/testmcp test_create_continue_mcp.md
Result: Direct execution of specified test specification with systematic validation
📋 REFERENCE DOCUMENTATION
/testmcp - MCP Test Suite Execution Command
Purpose
Execute MCP (Model Context Protocol) test specifications using the comprehensive /testllm framework for systematic test validation with real authentication and integration testing.
Usage Patterns
# Run all MCP tests
/testmcp
# Run specific test type
/testmcp integration
/testmcp performance
/testmcp unit
/testmcp mock
# Run with verification (dual-agent mode)
/testmcp verified
/testmcp verified integration
# Run specific test file
/testmcp test_create_continue_mcp.md
/testmcp verified test_create_continue_mcp.md
Core Principles
- LLM-Native Execution: Uses
/testllm framework for intelligent test execution
- Real Integration Testing: Tests actual MCP server functionality with real Firebase/Gemini APIs
- Comprehensive Coverage: Unit, integration, and performance testing for MCP architecture
- Systematic Validation: Evidence-based testing with TodoWrite tracking and screenshot documentation
Implementation Method
This command delegates to /testllm for intelligent test orchestration of MCP test specifications in the test_mcp/ directory (override with $MCP_TEST_DIR, default: test_mcp/).
Execution Flow:
/testmcp [args] → /testllm [testing_mcp/test_spec] [args]
Test Specifications Available
Integration Test Specification
File: testing_mcp/test_mcp/test_create_continue_mcp.md
- Objective: Complete MCP workflow validation from campaign creation through story progression
- Coverage: Real Firebase integration, Gemini AI integration, character creation, story continuation
- Duration: 5-10 minutes
- Authentication: Real user authentication required
- Validation: Full game state persistence and AI-generated content verification
Shell Script Test Suite
File: testing_mcp/run_mcp_tests.sh
- Test Types: unit, integration, performance, mock, all, docker
- Features: Mock services, real API modes, Docker containerization, comprehensive reporting
- Timeout: Configurable (default 300 seconds)
- Output: JUnit XML results, HTML reports, detailed logging
Command Implementation
When /testmcp is executed, it follows this systematic protocol:
Test Environment Requirements
Production Mode Testing
- MCP Server: Must be running in production mode (
PRODUCTION_MODE=true)
- Authentication: Real Google OAuth for authentic user flows
- APIs: Real Firebase Firestore and Gemini API integration
- Browser: Playwright MCP for headless browser automation
Mock Mode Testing (Alternative)
- Mock Services: Automated mock server startup via
run_mcp_tests.sh
- Simulated APIs: Mock Firebase and Gemini responses
- Faster Execution: Reduced test duration for rapid feedback
- Development: Suitable for development workflow validation
Success Criteria
Integration Test Success
- ✅ Campaign creation with real Firebase document ID
- ✅ Character creation flow completion without errors
- ✅ Story progression with genuine AI-generated content
- ✅ Game state persistence across multiple interactions
- ✅ All MCP tool calls successful with proper validation
Shell Script Test Success
- ✅ All pytest test cases pass (unit, integration, performance)
- ✅ Mock services start and respond correctly
- ✅ Test reports generated with detailed metrics
- ✅ No timeout or connection failures
- ✅ Cleanup procedures execute successfully
Error Handling
Common Test Failures
- MCP Server Connection: Verify server is running and accessible
- Authentication Failures: Ensure real Google OAuth credentials configured
- API Rate Limits: Implement backoff strategies for Gemini API calls
- Test Environment: Check Python virtual environment and dependencies
- Browser Automation: Verify Playwright MCP is available and functional
Recovery Protocols
- Environment Reset: Clean test databases and restart services
- Dependency Check: Validate all required packages and tools installed
- Configuration Audit: Verify environment variables and API keys
- Network Validation: Test connectivity to external services
- Log Analysis: Review detailed test logs for specific failure points
Integration with /testllm Framework
This command leverages the complete /testllm infrastructure:
Systematic Validation Protocol
- Requirements Analysis: Extract ALL test requirements to TodoWrite checklist
- Evidence Collection: Screenshots, logs, console output for each requirement
- Success Declaration: Only with complete evidence portfolio
- Failure Analysis: Specific error categorization and recommendations
Dual-Agent Architecture (Optional)
- TestExecutor Agent: Pure execution and evidence collection
- TestValidator Agent: Independent validation with fresh context
- Cross-Verification: Both agents must agree for final success declaration
- Bias Elimination: Separate validation removes execution investment bias
Command Examples
Basic MCP Integration Test
/testmcp integration
Result: Executes comprehensive campaign creation and story progression test with real APIs
Verified Performance Testing
/testmcp verified performance
Result: Dual-agent performance benchmark execution with independent validation
Mock Mode Testing
/testmcp mock
Result: Fast execution with mock services for development workflow validation
Anti-Patterns to Avoid
- ❌ Bypassing /testllm: Never implement test execution logic directly
- ❌ Mock Mode for Production: Use real APIs for production readiness validation
- ❌ Incomplete Evidence: Must capture screenshots and logs for all test steps
- ❌ Manual Assumptions: All test results require specific evidence backing
- ❌ Single-Pass Testing: Must test both success and failure scenarios
Quality Assurance Integration
Evidence Requirements
- Screenshots: Saved to
docs/ with descriptive names for each test phase
- Test Logs: Detailed execution logs with timestamps and status codes
- API Responses: Captured request/response data for integration validation
- Error Documentation: Specific error messages and stack traces when failures occur
Reporting Standards
- TodoWrite Tracking: Complete requirement-by-requirement validation status
- Priority Classification: CRITICAL/HIGH/MEDIUM/LOW issue categorization
- Actionable Feedback: Specific recommendations with code references
- Evidence Portfolio: Complete documentation package for each test execution
This command provides comprehensive MCP architecture testing through intelligent delegation to the proven /testllm framework, ensuring systematic validation and evidence-based conclusions for production readiness assessment.