Intelligent Regression Testing with Playwright MCP
Executes comprehensive regression testing comparing Flask and React V2 frontends with AI-powered visual analysis.
/plugin marketplace add jleechanorg/claude-commands/plugin install claude-commands@claude-commands-marketplaceWhen this command is invoked, YOU (Claude) must execute these steps immediately: This is NOT documentation - these are COMMANDS to execute right now. Use TodoWrite to track progress through multi-phase workflows.
Action Steps:
π§ ALWAYS START WITH /think:
Execute /think to analyze:
Action Steps:
π― IF USER PROVIDED INSTRUCTIONS:
Adapt the plan based on user input after /testuif:
π DEFAULT COMPARISON STRATEGY: When no specific instructions given: 5. Test ALL functionality from Flask frontend in React V2 6. Ensure feature parity between old and new sites 7. Validate critical user journeys work identically 8. Document any missing or broken functionality
Action Steps:
β‘ ALWAYS USE /execute:
Use /execute with comprehensive testing plan:
/execute
1. MANDATORY: Generate complete field interaction matrix (Campaign Type Γ Character Γ Setting Γ AI Options)
2. Apply smart matrix testing strategy (pairwise combinations + high-risk prioritization)
3. Set up browser automation environment (Playwright MCP with headless=true)
4. Execute Phase 1: High-risk matrix combinations (25 critical tests)
5. Execute Phase 2: Feature coverage matrix (35 comprehensive tests)
6. Execute Phase 3: Edge case matrix (20 boundary tests)
7. Test Flask frontend functionality (baseline) - ALL matrix paths
8. Test React V2 equivalent functionality - ALL matrix paths with dynamic placeholder verification
9. Compare feature parity with matrix cell-by-cell evidence
10. Document matrix findings with clickable screenshot links
11. Perform adversarial testing on previously failing matrix cells
12. Generate matrix coverage report with bug pattern analysis
13. Post comprehensive matrix results to PR documentation
Action Steps: πΈ ENHANCED SCREENSHOT POSTING PROTOCOL: Always post screenshots to PR using structured directory with Claude vision analysis:
docs/pr[NUMBER]/
βββ flask_baseline/
β βββ 01_landing_page.png
β βββ 02_campaign_list.png
β βββ 03_campaign_creation.png
β βββ ...
βββ react_v2_comparison/
β βββ 01_landing_page.png
β βββ 02_campaign_list.png
β βββ 03_campaign_creation.png
β βββ ...
βββ claude_vision_analysis/
β βββ flask_vs_react_comparison.md
β βββ hardcoded_values_detection.md
β βββ ui_consistency_analysis.md
β βββ functionality_gaps_identified.md
β βββ visual_regression_findings.md
βββ accessibility_validation/
β βββ flask_accessibility_tree.json
β βββ react_accessibility_tree.json
β βββ accessibility_comparison.md
β βββ wcag_compliance_check.md
βββ technical_verification/
β βββ dom_inspector_output.txt
β βββ css_properties_extracted.json
β βββ network_requests_log.json
β βββ console_errors_warnings.txt
β βββ verification_evidence.md
βββ issues_found/
β βββ broken_functionality.png
β βββ missing_features.png
β βββ claude_vision_detected_issues.png
β βββ ...
βββ testing_report.md
π CLAUDE VISION INTEGRATION WORKFLOW:
## π REFERENCE DOCUMENTATION
# Intelligent Regression Testing with Playwright MCP
**Purpose**: Perform comprehensive regression testing using `/think` and `/execute`, comparing old Flask site to new React V2 site with full functionality validation
**Action**: Always use `/think` to analyze context, then `/execute` for systematic testing. Adapts plan based on user input after command.
**Usage**:
- `/testuif` (automatic analysis and testing)
- `/testuif [specific instructions]` (adapts plan to user requirements)
## Execution Protocol
## π¨ MANDATORY QUALITY ASSURANCE INTEGRATION
**Quality Assurance Protocol**: See [CLAUDE.md Β§Mandatory QA Protocol](/CLAUDE.md#-mandatory-quality-assurance-protocol). All browser testing MUST comply.
### π Pre-Testing Requirements (β οΈ MANDATORY)
- **Matrix Testing**: Apply full matrix methodology per canonical protocol
- **Evidence Collection**: Screenshot every test matrix cell with path labels
- **Completion Validation**: All validation gates from QA protocol required
- **Code Scanning**: Search for hardcoded values and patterns systematically
- **Adversarial Testing**: Attempt to break claimed fixes and verify related patterns
π¨ MANDATORY HEADLESS CONFIGURATION:
# Environment variables for headless enforcement
export PLAYWRIGHT_HEADLESS=1
export BROWSER_HEADLESS=true
# Playwright MCP configuration with explicit headless flag
mcp__playwright-mcp__browser_navigate --headless=true --url="http://localhost:8081"
mcp__playwright-mcp__browser_take_screenshot --headless=true --filename="baseline.png"
INTEGRATION: Based on latest AI testing research, use advanced validation methods:
# Capture screenshots to filesystem for Claude vision analysis
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
VALIDATION_DIR="/tmp/testuif_validation/$TIMESTAMP"
mkdir -p "$VALIDATION_DIR"
# Flask baseline screenshots
mcp__playwright-mcp__browser_take_screenshot --name="flask_dashboard" --path="$VALIDATION_DIR/flask_dashboard.png"
mcp__playwright-mcp__browser_take_screenshot --name="flask_campaign_creation" --path="$VALIDATION_DIR/flask_campaign_creation.png"
# React V2 comparison screenshots
mcp__playwright-mcp__browser_take_screenshot --name="react_dashboard" --path="$VALIDATION_DIR/react_dashboard.png"
mcp__playwright-mcp__browser_take_screenshot --name="react_campaign_creation" --path="$VALIDATION_DIR/react_campaign_creation.png"
# Claude analyzes screenshots directly using Read tool
echo "Screenshots saved to $VALIDATION_DIR for Claude vision analysis"
# Get structured accessibility data alongside screenshots
mcp__playwright-mcp__browser_snapshot > "$VALIDATION_DIR/flask_accessibility.json"
mcp__playwright-mcp__browser_snapshot > "$VALIDATION_DIR/react_accessibility.json"
# Cross-validate structured data with visual confirmation
# Claude can read both JSON accessibility data AND screenshot images
# Create baselines if they don't exist
if [ ! -d "/tmp/testuif_baselines" ]; then
mkdir -p /tmp/testuif_baselines
echo "Creating baseline screenshots for future comparisons"
mcp__playwright-mcp__browser_take_screenshot --path="/tmp/testuif_baselines/flask_baseline.png"
fi
# Compare current state against established baselines
# Use Claude vision to analyze differences between baseline and current screenshots
π¨ FAILURE-EXIT SEMANTICS:
# Exit codes for parity check failures
PARITY_CHECK_PASSED=0 # All functionality matches
PARITY_CHECK_FAILED=1 # Feature parity failures detected
CRITICAL_ERROR=2 # Browser automation or system errors
# Bail-on-failure implementation
set -e # Exit immediately on any command failure
set -o pipefail # Fail on any pipe command failure
# Example parity validation with non-zero exit
validate_feature_parity() {
local flask_result="$1"
local react_result="$2"
if [ "$flask_result" != "$react_result" ]; then
echo "β PARITY FAILURE: Feature mismatch detected"
echo "Flask: $flask_result"
echo "React: $react_result"
exit $PARITY_CHECK_FAILED
fi
echo "β
PARITY VERIFIED: Feature behavior matches"
return 0
}
π FULL FUNCTIONALITY COMPARISON TESTING:
Flask Frontend (Baseline) - Test ALL of:
React V2 Frontend (New) - Validate SAME functionality:
for screenshot in "$VALIDATION_DIR"/*.png; do echo "Analyzing $(basename $screenshot) with Claude vision..." # Claude reads and analyzes each screenshot for: # - Hardcoded values detection ("Ser Arion", "Loading campaign details...") # - UI consistency verification # - Feature parity validation # - Visual regression detection done
echo "Screenshots ready for Claude vision analysis at: $VALIDATION_DIR" echo "Use Read tool to analyze each screenshot with specific validation prompts"
**π AUTO-GENERATE ENHANCED PR COMMENT:**
Create structured comment on PR with Claude vision analysis integration:
- **Executive Summary**: Testing results with Claude vision insights
- **Claude Vision Analysis Summary**:
- Hardcoded values detection results
- UI consistency findings
- Visual regression analysis
- Feature parity visual verification
- **Technical Verification Summary:** DOM state, CSS properties, network requests, console logs
- **Accessibility Analysis:** Tree comparison and WCAG compliance
- Feature parity analysis (β
Working, β Broken, β οΈ Different, π Claude Vision Detected)
- **Evidence-Based Confidence Score:** Percentage based on technical + visual verification completeness
- Performance comparison between frontends
- **Claude Vision Critical Issues:** Issues detected through AI screenshot analysis
- Links to all screenshot, accessibility, AND Claude vision analysis evidence
- **Anti-bias verification:** What was tested that should NOT work
- **Visual Regression Score:** Pixel-level difference analysis
- Recommendations for deployment readiness with visual confidence metrics
## Example Execution Flows
### Basic Usage with Enhanced Validation
User: /testuif Claude: /think [analyzes PR context and creates testing strategy with Claude vision integration] Claude: /execute [comprehensive Flask vs React V2 comparison with enhanced screenshot validation] Claude: [Posts results to docs/pr1118/ with Claude vision analysis, accessibility trees, and visual regression findings]
### Adapted Usage with Claude Vision Focus
User: /testuif and make sure you compare old site to the new site and all functionality in the old site fully tested in new site Claude: /think [incorporates specific comparison requirements + Claude vision hardcoded value detection] Claude: /execute [focused on complete feature parity with AI-powered visual validation] Claude: [Detailed functionality mapping with Claude vision analysis of hardcoded "Ser Arion" and placeholder text]
### Custom Focus with Visual Regression
User: /testuif focus on campaign creation workflow and authentication only Claude: /think [narrows scope to specified areas + progressive baseline comparison] Claude: /execute [deep dive testing with accessibility tree + screenshot hybrid validation] Claude: [Targeted testing report with Claude vision analysis of specific UI components]
### Enhanced Validation Prompt Templates
```markdown
## Claude Vision Analysis Prompts for /testuif
### Hardcoded Values Detection
"Read [screenshot] and check for:
1. Any hardcoded 'Ser Arion' character names
2. 'Loading campaign details...' placeholder text
3. Static 'intermediate β’ fantasy' labels
4. Non-functional button placeholders
Format as: β
PASS / β FAIL with exact locations"
### Feature Parity Visual Verification
"Compare [flask_screenshot] with [react_screenshot]:
1. Identical UI layout and positioning
2. Same data displayed in both versions
3. Consistent button placement and functionality
4. Visual elements match expected design
Report differences with pixel-level precision"
### Visual Regression Analysis
"Analyze [current_screenshot] vs [baseline_screenshot]:
1. Layout shifts or element displacement
2. Color changes or styling differences
3. Missing or added UI elements
4. Text content modifications
Provide regression score (0-100%) with change summary"
π§ MANDATORY SLASH COMMAND USAGE:
/think for analysis and planning/execute for actual test implementationπ COMPLETE FUNCTIONALITY VALIDATION:
πΈ COMPREHENSIVE VISUAL DOCUMENTATION:
π¨ ENHANCED BROWSER AUTOMATION WITH AI VALIDATION:
PLAYWRIGHT_HEADLESS=1 environment variable enforcedexit 1)--headless=true --bail flags prevent silent partial passesβ ENHANCED DEPLOYMENT READINESS ASSESSMENT: Final output always includes:
π― AI-POWERED CONTEXTUAL ADAPTATION:
π ADVANCED ISSUE DETECTION:
π AI-ENHANCED ACTIONABLE REPORTING:
π 2025 AI Testing Innovation:
This command provides intelligent, comprehensive regression testing that ensures your React V2 frontend is deployment-ready with full feature parity to the existing Flask frontend.