From claude-commands
Validates D&D 5e level-up flows: availability entry, complete recommended packages with auto-selection, and real LevelUpAgent evidence. Use during testing to confirm correctness for any class, subclass, multiclass, or custom class.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:level-up-validationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when judging whether a level-up flow works correctly for any
Use this skill when judging whether a level-up flow works correctly for any D&D 5e class, subclass, multiclass, or campaign-defined custom class.
This is an evaluation protocol. Do not repair code from this skill unless the user explicitly asks for implementation.
testing_mcp/core/test_level_up_organic.py
evidence bundle.game_state.json, story transcript, server logs, and checksums.$PROJECT_ROOT/prompts/level_up_instruction.md and the LevelUpAgent final modal
override in $PROJECT_ROOT/agents.py.Fail closed. A level-up flow passes only when all of these are proven by real artifacts, not agent claims.
target_level > current_level.level_up_now.level_up_now opens or continues the modal only. It does not commit the new
level, advance time, roll HP, award rewards, or resolve story actions.The first modal response after level_up_now must visibly include
Recommended package: and must be finalizable immediately.
The package must audit the character's class/subclass/custom class, current level, target level, campaign mechanics, and prior character sheet. It must account for every automatic gain and every player-selectable decision, including when applicable:
For prepared casters, always-prepared or feature-granted spells are additive. They do not satisfy the ordinary prepared-spell decision unless the class rules explicitly say so.
For custom classes, the model must use the custom class description and campaign rules. If a required custom-class rule is genuinely absent, the modal must stay active and ask for that missing mechanical input instead of silently skipping it.
Every required decision category must have a pre-selected recommendation before the player finishes. The visible text must distinguish:
finish_level_up_return_to_game must be present as the final accept option
once the recommended package is complete.
Every player-selectable recommendation must be editable through planning-block choices.
Acceptable patterns:
level_up_* choice for the
category.level_up_* change choice opens a follow-up modal with
concrete legal options for that category.Unacceptable patterns:
finish_level_up_return_to_game appears while unresolved decisions
remain.Free-form text must work for every selectable category while the level-up modal is active.
Validate at least one free-form edit in evidence, such as:
The response must remain in LevelUpAgent, preserve modal state, reflect the
updated pending selection, and keep finish_level_up_return_to_game available
when all required decisions are complete.
On finish_level_up_return_to_game:
player_character_data.Run the canonical real test when real LLM proof is requested:
cd testing_mcp && ../vpython core/test_level_up_organic.py --server http://127.0.0.1:8001
Do not use pytest collection for testing_mcp/ script suites.
Do not set mock-mode toggles for evidence-bearing testing_mcp/ runs.
Record the evidence bundle path, PR head SHA, server URL, model/provider, and pass/fail result.
Inspect raw artifacts, not just summary prose:
metadata.jsonrun.jsongame_state.jsonCompare the evidence bundle SHA to the live PR head. If stale, declare exactly which later diffs are behavior-neutral before accepting it.
Use this structure:
Verdict: PASS | FAIL | PARTIAL
Evidence bundle:
PR head:
Model/provider:
Availability:
Recommended package completeness:
Auto-selection:
Click editing:
Free-form editing:
Finish commit:
Blocking gaps:
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsDefines architecture boundaries for level-up, rewards, and XP logic — file ownership, API contracts, and correction guard rules. Consult before any rewards/XP code change.
Pressure tests RPG campaigns and modules by simulating adversarial player actions to uncover plot holes, dead-ends, exploits, and edge cases like murder hobo behavior or clever spell use.