npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
Stateful property-based testing for systems with mutable state. Language-agnostic concepts; load language skills for framework syntax.
Provides property-based testing fundamentals: core concepts, property taxonomy with decision tables/trees, and selection strategies. Language-agnostic.
Guides property-based testing for serialization roundtrips, idempotence, invariants, parsing, validation, and smart contracts across languages.
Guides property-based testing for serialization, validation, normalization, and pure functions with property catalog, pattern detection, and library references.
Share bugs, ideas, or general feedback.
Stateful property-based testing for systems with mutable state. Language-agnostic concepts; load language skills for framework syntax.
Is the system under test a pure function?
Yes -> Use stateless PBT (pbt-fundamentals skill)
No -> Does it have mutable state that affects behavior?
Yes -> Does the state change through a sequence of operations?
Yes -> Use stateful PBT (this skill)
No -> Use stateless PBT with setup/teardown
No -> Use stateless PBT
Use stateful PBT for: databases | APIs | caches | queues | state machines | protocols | connection pools | file systems.
Simplified representation of system's expected state. Drives test generation and verification.
Rules for model design:
Each command defines four things:
| Component | Purpose | When It Runs |
|---|---|---|
| Precondition | Is this command valid in current state? | Generation + shrinking |
| Execution | Run operation on real system | Execution phase only |
| State transition | Update model | Both phases |
| Postcondition | Does real result match model? | Execution phase only |
Phase 1 -- Generation (abstract): Build command sequence using model only. No real system code runs. Return values are symbolic placeholders.
Phase 2 -- Execution (concrete): Run commands against real system. Replace symbolic values with real values. Check postconditions after each step.
During generation, commands returning values produce symbolic placeholders, resolved to real values during execution.
Rule: Treat all values from real system as opaque in model code. Store and pass them, never inspect during generation.
Model: dictionary/map
Commands: create (adds to model + system), read (compare), update (modify both), delete (remove from both)
Invariant: model size == system size
Model: set of active resources
Commands: acquire (add to set), use (precondition: in set), release (remove from set)
Invariant: no use-after-release
Model: committed state + pending state + in_transaction flag
Commands: begin, write (update pending), commit (pending->committed), rollback (discard pending)
Invariant: reads outside transaction see committed state only
command CreateItem:
Precondition:
model.count < MAX_ITEMS
Execution:
result = system.create(random_item())
StateTransition:
model.items[result.id] = item
Postcondition:
assert result.id in system.list_all()
assert model.count == system.count()
Tests that concurrent execution is equivalent to some valid sequential execution.
Supported by: PropEr (parallel_commands) | CsCheck (SampleConcurrent) | Hedgehog | fast-check (scheduler) | ScalaCheck.
NOT supported by: Hypothesis.
Use linearizability testing only when concurrent correctness is primary risk. Sequential stateful testing catches most bugs at lower cost.
Making model as complex as system. If model bugs mirror system bugs, test is worthless. Fix: Model the what (interface contract), not the how (implementation).
Not monitoring which commands execute. Tests "pass" while never exercising critical paths. Fix: Collect distribution statistics. Weight commands to ensure coverage.
Missing preconditions allowing invalid commands, producing spurious failures. Fix: Preconditions should precisely capture system's documented contract.
Too many state variables creating combinatorial explosion (N booleans = 2^N states). Fix: Start minimal. Add complexity incrementally. Use preconditions to prune unreachable states.
I/O or external mutations in preconditions or state transitions. Fix: Model code must be pure. Only command execution touches real system.
Pattern-matching symbolic placeholders without real values yet.
Fix: Never inspect values from real system in next_state. Store and pass, don't transform.
Commands that work during generation but break during shrinking due to incomplete preconditions. Fix: Every command using a resource must have precondition verifying resource exists in model.
Modeling internal structure (e.g., tree rotations) instead of observable behavior. Fix: Model the interface contract. For sorted set, model is sorted list.
When a stateful test fails, framework reports command sequence and failing postcondition. Read as a story:
| Symptom | Likely Cause |
|---|---|
| Failure only with 3+ commands | State-dependent bug triggered by specific sequence |
| Failure involves create-then-use | Resource lifecycle bug or symbolic reference issue |
| Non-deterministic failures | Race condition or external dependency |
| Failure disappears after shrinking removes commands | Missing precondition |
| Model and system disagree on count/size | Off-by-one or missing cleanup |
Shrunk sequence is minimal reproduction. If shrinking removes a command and test still fails, that command was irrelevant. Remaining commands are all necessary.
If shrinking produces unexpectedly long sequence, check for missing preconditions -- shrinker can't remove commands when removal would violate later preconditions.
If minimal failing sequence needs 6+ commands, shrinking removed many dependencies -- check for missing preconditions.
Model doesn't need to capture every system behavior. Focus on:
A partial model testing 5 operations well beats a complete model testing 20 operations poorly.