Skill

nw-pbt-stateful

From nw

Provides stateful property-based testing patterns, model-based testing, and decision guidance for mutable state systems like databases, APIs, caches, queues, and state machines.

testing

npx claudepluginhub nwave-ai/nwave --plugin nw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Stateful property-based testing for systems with mutable state. Language-agnostic concepts; load language skills for framework syntax.

SKILL.md

Similar Skills

nw-pbt-fundamentals

484

Provides property-based testing fundamentals: core concepts, property taxonomy with decision tables/trees, and selection strategies. Language-agnostic.

property-based-testing

5.1k

Guides property-based testing for serialization roundtrips, idempotence, invariants, parsing, validation, and smart contracts across languages.

8 files

property-based-testing

206

Guides property-based testing for serialization, validation, normalization, and pure functions with property catalog, pattern detection, and library references.

ed3d-house-style

Stats

Parent Repo Stars484

Parent Repo Forks49

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stateful PBT

Stateful property-based testing for systems with mutable state. Language-agnostic concepts; load language skills for framework syntax.

When to Use Stateful vs Stateless PBT

Is the system under test a pure function?
  Yes -> Use stateless PBT (pbt-fundamentals skill)
  No -> Does it have mutable state that affects behavior?
    Yes -> Does the state change through a sequence of operations?
      Yes -> Use stateful PBT (this skill)
      No  -> Use stateless PBT with setup/teardown
    No  -> Use stateless PBT

Core Concepts

The Model

Simplified representation of system's expected state. Drives test generation and verification.

Rules for model design:

Simpler than system under test (key-value store model is just a dict)
Captures observable behavior, not implementation details
Pure -- no side effects in model code
Source of authority during both generation and shrinking

Commands

Each command defines four things:

Component	Purpose	When It Runs
Precondition	Is this command valid in current state?	Generation + shrinking
Execution	Run operation on real system	Execution phase only
State transition	Update model	Both phases
Postcondition	Does real result match model?	Execution phase only

Two-Phase Execution

Phase 1 -- Generation (abstract): Build command sequence using model only. No real system code runs. Return values are symbolic placeholders.

Phase 2 -- Execution (concrete): Run commands against real system. Replace symbolic values with real values. Check postconditions after each step.

Symbolic References

During generation, commands returning values produce symbolic placeholders, resolved to real values during execution.

Rule: Treat all values from real system as opaque in model code. Store and pass them, never inspect during generation.

Command Design Patterns

Basic CRUD Model

Model: dictionary/map
Commands: create (adds to model + system), read (compare), update (modify both), delete (remove from both)
Invariant: model size == system size

Resource Lifecycle

Model: set of active resources
Commands: acquire (add to set), use (precondition: in set), release (remove from set)
Invariant: no use-after-release

Transaction Model

Model: committed state + pending state + in_transaction flag
Commands: begin, write (update pending), commit (pending->committed), rollback (discard pending)
Invariant: reads outside transaction see committed state only

CRUD Command Structure (Pseudocode)

command CreateItem:
  Precondition:
    model.count < MAX_ITEMS
  Execution:
    result = system.create(random_item())
  StateTransition:
    model.items[result.id] = item
  Postcondition:
    assert result.id in system.list_all()
    assert model.count == system.count()

Linearizability Testing

Tests that concurrent execution is equivalent to some valid sequential execution.

Define sequential model (same as above)
Generate sequential prefix to establish state
Generate parallel command branches
Execute branches concurrently
Check if results match any valid sequential interleaving

Supported by: PropEr (parallel_commands) | CsCheck (SampleConcurrent) | Hedgehog | fast-check (scheduler) | ScalaCheck. NOT supported by: Hypothesis.

Use linearizability testing only when concurrent correctness is primary risk. Sequential stateful testing catches most bugs at lower cost.

Anti-Patterns (8 Documented)

1. Model Too Complex

Making model as complex as system. If model bugs mirror system bugs, test is worthless. Fix: Model the what (interface contract), not the how (implementation).

2. Insufficient Action Coverage

Not monitoring which commands execute. Tests "pass" while never exercising critical paths. Fix: Collect distribution statistics. Weight commands to ensure coverage.

3. Weak Preconditions

Missing preconditions allowing invalid commands, producing spurious failures. Fix: Preconditions should precisely capture system's documented contract.

4. State Explosion

Too many state variables creating combinatorial explosion (N booleans = 2^N states). Fix: Start minimal. Add complexity incrementally. Use preconditions to prune unreachable states.

5. Side Effects in Model Code

I/O or external mutations in preconditions or state transitions. Fix: Model code must be pure. Only command execution touches real system.

6. Inspecting Opaque Values During Generation

Pattern-matching symbolic placeholders without real values yet. Fix: Never inspect values from real system in next_state. Store and pass, don't transform.

7. Not Shrinking-Aware

Commands that work during generation but break during shrinking due to incomplete preconditions. Fix: Every command using a resource must have precondition verifying resource exists in model.

8. Testing Implementation Details

Modeling internal structure (e.g., tree rotations) instead of observable behavior. Fix: Model the interface contract. For sorted set, model is sorted list.

Debugging Failed Stateful Tests

Reading a Failing Command Sequence

When a stateful test fails, framework reports command sequence and failing postcondition. Read as a story:

Identify failing command: Last command is where postcondition failed
Trace model state: Walk through each command, tracking model expectations
Find divergence: Where does real system diverge from model?
Check if test bug or system bug: Is model wrong, or system wrong?

Common Failure Patterns

Symptom	Likely Cause
Failure only with 3+ commands	State-dependent bug triggered by specific sequence
Failure involves create-then-use	Resource lifecycle bug or symbolic reference issue
Non-deterministic failures	Race condition or external dependency
Failure disappears after shrinking removes commands	Missing precondition
Model and system disagree on count/size	Off-by-one or missing cleanup

Shrinking Interpretation

Shrunk sequence is minimal reproduction. If shrinking removes a command and test still fails, that command was irrelevant. Remaining commands are all necessary.

If shrinking produces unexpectedly long sequence, check for missing preconditions -- shrinker can't remove commands when removal would violate later preconditions.

If minimal failing sequence needs 6+ commands, shrinking removed many dependencies -- check for missing preconditions.

Oracle Design

Building a Model

Start with simplest possible data structure (usually dict or list)
Add only state needed to express postconditions
Keep model transitions obvious and trivial
If model gets complex, you may be modeling too much

Handling Non-Determinism

Acceptable non-determinism (UUID generation): Track generated values in model, compare by stored reference
Timing-dependent behavior: Model set of acceptable outcomes, assert real result is in that set
Random internal choices: Use seeded system, or assert invariants rather than exact values

Model Completeness

Model doesn't need to capture every system behavior. Focus on:

Operations you want to test
State those operations depend on
Postconditions that verify correctness

A partial model testing 5 operations well beats a complete model testing 20 operations poorly.