Help us improve
Share bugs, ideas, or general feedback.
From claude-commands
Validates dice roll integrity end-to-end with real MCP services, covering Gemini code_execution, native_two_phase, distribution tests, and chi-squared authenticity checks.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:dice-real-mode-testsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this when validating **dice integrity** end-to-end with real services.
Validates dice roll authenticity using chi-squared statistical tests and RNG code verification to detect LLM fabrication.
Exercises MCP tools, resources, and prompts against a live HTTP server via JSON-RPC over curl. Starts server, runs real and adversarial inputs, produces a report with findings and follow-ups. Use after modifying definitions or to verify surface.
Exercises MCP HTTP server tools/resources/prompts via real JSON-RPC calls. Starts the server, catalogs endpoints, runs adversarial inputs, and produces a report with findings. Use after adding or modifying MCP definitions.
Share bugs, ideas, or general feedback.
Use this when validating dice integrity end-to-end with real services.
testing_mcp/test_dice_rolls_comprehensive.py.claude/skills/evidence-standards.md (Three-Evidence Rule).claude/skills/dice-authenticity-standards.md (Chi-squared + RNG verification)python testing_mcp/test_dice_rolls_comprehensive.py \
--server-url https://<preview-app>.run.app/mcp \
--evidence \
--evidence-dir /tmp/<run-id> \
--models gemini-3-flash-preview,qwen-3-235b-a22b-instruct-2507
Notes:
roll_dice tool is unavailable on preview./tmp/<run-id>/ (timestamp or UUID) per run to avoid collisions.python testing_mcp/test_dice_rolls_comprehensive.py \
--start-local --real-services --evidence --enable-dice-tool \
--models gemini-3-flash-preview,qwen-3-235b-a22b-instruct-2507 \
--evidence-dir /tmp/<run-id>
Outputs:
run.json (scenario results, tool_results, dice_audit_events, warnings)local_mcp_*.log (server logs)raw_*.txt (raw model responses when enabled)DICE_ROLLS_MISMATCH / DICE_AUDIT_MISMATCH warnings can appear; server overrides with tool_results.run.json should show aligned totals across dice_rolls, dice_audit_events, and tool_results.After running distribution tests, validate authenticity:
| Chi-Squared | Sample Size | Verdict |
|---|---|---|
| < 30 | 100+ | PASS |
| 30-50 | 100+ | WARNING - Investigate |
| > 50 | 100+ | FAIL - Likely fabrication |
Reference: PR #2551 detected fabrication with chi-squared = 411.81
run.json: check the log and confirm override occurred.roll_dice tool is exposed.rng_verified field in evidence - may indicate fabrication.