From hegel-skill
Writes property-based tests using Hegel for Rust, Go, and C++ projects. Useful for testing invariants, round-trips, contracts via generative inputs, fuzzing, and shrinking.
npx claudepluginhub hegeldev/hegel-skillThis skill uses the workspace's default tool permissions.
Hegel is a family of property-based testing libraries supporting multiple languages, powered by Hypothesis. Tests integrate with standard language test runners. Hegel generates random inputs for your code and automatically shrinks failing cases to minimal counterexamples.
Guides property-based testing for serialization roundtrips, idempotence, invariants, parsing, validation, and smart contracts across languages.
Designs property-based tests verifying code invariants across automatically generated inputs. Guides Hypothesis (Python), fast-check (JS/TS), JUnit QuickCheck (Java) for algorithms, parsers, edge cases.
Generates property-based tests using fast-check (TypeScript/JavaScript with Vitest) and Hypothesis (Python) to find edge cases, verify properties like roundtrips and invariants.
Share bugs, ideas, or general feedback.
Hegel is a family of property-based testing libraries supporting multiple languages, powered by Hypothesis. Tests integrate with standard language test runners. Hegel generates random inputs for your code and automatically shrinks failing cases to minimal counterexamples.
Even when PBTs add modest line coverage over unit tests, their value is in exercising combinations and boundary conditions that humans don't think to write by hand.
Code examples in this file use Python-like pseudocode to illustrate concepts. For exact API and syntax, load the language-specific reference (see step 1 of the workflow).
Follow these steps when writing property-based tests.
Determine the project language and load the corresponding reference from references/<language>/reference.md for API details and idiomatic patterns.
Before writing any test, understand what you're testing:
The goal is to find evidence for properties, not to invent them.
Look for properties that are:
Write one test per property. Don't cram multiple properties into one test.
See the Property Catalogue below for a taxonomy of what to look for.
Before writing tests from scratch, check what already exists.
Existing PBTs in another framework (proptest, quickcheck, rapid, gopter, etc.) should be ported to hegel. Load the language-specific porting reference (references/<language>/porting.md). Key things to know about hegel when porting:
tc.draw() whenever it needs a value — you can draw conditionally, in loops, and have later draws depend on earlier values without needing flat_map.prop_assert! or return-a-bool pattern needed.Unit tests and example-based tests can often be evolved into PBTs. Tests with hardcoded seeds, parameterized examples, or multiple similar test cases are prime candidates. Load references/evolving-tests.md for detailed guidance on recognizing what property a unit test is hiding. If you can't immediately see the right property, start by parameterizing the test — replace concrete values with generated ones and keep a simple oracle. You can refine the property later.
Tests that use rand with fixed seeds are especially good candidates — the randomness should come from hegel instead so failures produce shrinkable counterexamples.
When you evolve an existing test, modify the existing test file rather than creating a new one. Property-based tests are tests like any other and belong with the code they're testing. Do not create a separate file for hegel tests.
For each property:
Run the tests. When a test fails, ask:
Property-based tests aren't always the right tool. Prefer unit tests when:
assert render(doc) == "<html>..." depends on a specific output format — there's no general property to check.Use this catalogue to identify what to test. Not every category applies to every function — pick the ones supported by evidence from the code.
The first five patterns are ordered by how often they've found real bugs in practice.
Model tests — For any data structure, the highest-value first test is a stateful model test: define rules for each operation (insert, remove, get, etc.), run them against both the library under test and a known-good reference (the "model"), and assert they agree after every operation. Use hegel's stateful testing support (see the language reference) rather than hand-rolling the operation loop.
The exact syntax varies significantly by language — check the language reference for the stateful testing API. Conceptually, a model test looks like:
state_machine MyMapTest:
subject = MyMap()
model = HashMap()
rule insert():
k = tc.draw(integers())
v = tc.draw(integers())
subject.insert(k, v)
model.insert(k, v)
rule remove():
k = tc.draw(integers())
subject.remove(k)
model.remove(k)
rule get():
k = tc.draw(integers())
assert subject.get(k) == model.get(k)
invariant agrees:
assert subject == model
Choose the right model: Vec for sequential containers, HashMap for hash maps, BTreeMap/sorted map for ordered maps, HashSet/set for unordered sets.
Idempotence tests — Any normalization, case conversion, or formatting function should satisfy f(f(x)) == f(x). Use full Unicode text generators (not ASCII-only) because Unicode edge cases like ß -> SS and combining characters are where bugs hide.
s = tc.draw(text())
once = normalize(s)
twice = normalize(once)
assert once == twice
Parse robustness — Parsers (from_str, parse, decode) should handle all input without panicking. The property is simple: it should never crash, even on garbage input.
s = tc.draw(text())
_ = MyType.parse(s) # should return an error, never panic
Roundtrip tests — parse(format(x)) == x for any serialize/deserialize pair. Test with the full input domain. Bugs hide at zero (scientific notation edge cases), large integers (precision loss through f64 for values > 2^53), and unusual string content.
n = tc.draw(integers())
s = format(n)
assert parse(s) == n
Boundary value tests — Integer boundary values (MIN, MAX, 0) are where overflow bugs hide. Don't add bounds to avoid them — they ARE the test. Negating MIN overflows, intermediate products overflow, GCD/LCM computations overflow on boundary inputs.
a = tc.draw(integers()) # includes MIN, MAX, 0
b = tc.draw(integers())
tc.assume(b != 0)
result = my_numeric_op(a, b) # should not overflow/panic
| Category | Description | Example |
|---|---|---|
| Commutativity | order of operations doesn't matter | a + b == b + a or f(g(x)) == g(f(x)) |
| Invariant preservation | an operation maintains a structural property | insert into BST preserves ordering |
| Oracle / reference impl | compare against a known-correct implementation | my_sort(xs) == std_sort(xs) |
| Monotonicity | more input means more (or equal) output | len(xs ++ ys) >= len(xs) |
| Bounds / contracts | output stays within documented limits | clamp(x, lo, hi) is in [lo, hi] |
| No-crash / robustness | function handles all valid inputs without panicking | parse(arbitrary_string) doesn't crash |
| Equivalence | two implementations produce the same result | iterative_fib(n) == recursive_fib(n) |
| Consistency | related APIs in the same library agree | string_width(s) == sum(char_width(c) for c in s) |
| Large input sizes | exercise deep structure paths that small inputs miss | draw size separately, force 50-200+ elements for trees/tries |
| Feature flag testing | non-default features are often less tested | enable SIMD, nightly, or experimental features and run tests |
| Category | What to look for |
|---|---|
| Integer overflow | Boundary values (MIN, MAX, 0) in arithmetic, GCD, negation, display |
| Idempotence failure | Case conversion / normalization with Unicode (ß -> SS), word splitting on case transitions |
| Precision loss | Numbers routed through f64 lose precision for integers > 2^53 |
| Roundtrip failure | Format/parse on edge cases: zero, empty strings, unusual path components |
| Parse panic | from_str delegates to a constructor that panics instead of returning Err |
| Stale state | Update operations that modify one index but don't clean up the old entry in another |
| Unicode line breaks | \u{85} (NEL), \u{2028} (LS), \u{2029} (PS) treated inconsistently as line breaks |
| SIMD divergence | SIMD code path produces different results than the scalar fallback |
| Deep structure bugs | Traversal that only fails when data structure has multiple internal levels (50-200+ elements) |
Properties must be evidence-based. Find evidence in:
merge(a: List, b: List) -> List implies the output length might equal the sum of input lengths.Err on the side of creating more properties rather than fewer, and if they fail investigate whether the failure is legitimate behavior or not.
Beware of properties that seem universal but aren't. Read the docs carefully before asserting a property. Examples from real testing:
reverse(reverse("\n\r")) != "\n\r" because \r\n is one grapheme cluster while \n\r is two).difference might mean symmetric difference (A triangle B), not set difference (A \ B) — check the docs.When a property fails, investigate whether it's a real bug or a genuine edge case in the domain. A weaker property often still holds.
The most common mistake when writing property-based tests is over-constraining generators. Broad generators find more bugs because they explore inputs the developer didn't anticipate. Constrained generators give a false sense of safety.
If the function accepts any integer, generate any integer:
n = tc.draw(integers()) # full range of the type, no min/max
Preemptively adding bounds like .min(0).max(100) means you'll never discover that the function overflows on large values, mishandles negatives, or breaks at the type's boundaries. Those are exactly the bugs PBT is designed to find.
Don't narrow ranges to "avoid edge cases." If a function claims to work on all integers, test it on all integers — including MIN, MAX, 0, -1, and 1. If it breaks, that's valuable information.
Unless the function's contract explicitly requires non-empty input, test with empty collections too. If a function panics on an empty collection, that might be a bug worth knowing about.
Assume it's a real bug unless you have strong evidence otherwise. If in doubt, ask the user.
MAX, that's a bug in the code, not in your test.Add generator bounds only when:
When a constraint involves relationships between multiple generated values, you might use tc.assume():
a = tc.draw(integers())
b = tc.draw(integers())
tc.assume(a != b) # this is fine for simple constraints
But it's better to construct valid inputs directly when you can:
# Instead of tc.assume(a <= b), generate in order:
a = tc.draw(integers())
b = tc.draw(integers())
if a > b:
a, b = b, a
This is particularly important when the rejection rate would be high. For example, integers().map(n -> n * 2) is much better than integers().filter(n -> n % 2 == 0) — the latter throws away ~50% of test cases.
Hegel's default collection size is small. If you need large collections (e.g., to exercise deep tree paths or multi-level node structures), draw the size separately:
# can generate large collections, and hegel can shrink n to find the minimal size
n = tc.draw(integers(min=0, max=300))
keys = tc.draw(lists(integers(), min_size=n)) # no max_size — let hegel go bigger
# BAD — hegel's default size distribution rarely produces 100+ elements
keys = tc.draw(lists(integers()))
When testing maps/sets that need unique keys, use the unique option on collection generators. This avoids confusion about which value wins for duplicate keys. See the language-specific reference for syntax.
When the code under test requires an RNG, do not create a seeded RNG with a hegel-generated seed. Hegel can only shrink the seed integer, not the actual random decisions the RNG makes — so when a test fails, you get a meaningless minimal seed rather than a meaningful minimal sequence of random choices.
Instead, use hegel's random generator, which gives you an RNG that routes random decisions through hegel's shrinking engine. See the language-specific reference for the exact API.
How to choose: Start with the default. If tests hang or time out because the code does rejection sampling internally, switch to true randomness mode.
If the code under test takes a concrete RNG type rather than a trait/interface, consider whether it should be refactored to accept a generic RNG. This is both better API design and makes the code testable with hegel's random generator. Suggest this refactoring to the user.
Over-constraining generators — Adding bounds "just in case" means the test will never find bugs at boundary values or with unexpected inputs. The whole value of PBT is exploring the input space the developer didn't think to test by hand. See Generator Discipline above.
Testing trivial properties — assert x == x or assert len(vec) >= 0 test nothing useful. Every property should be falsifiable by a buggy implementation.
Using the implementation as the oracle — If your test calls the same function to compute the expected result, it can never fail. Use an independent reference implementation, a simpler algorithm, or a structural property.
High rejection rates — If .filter() or tc.assume() rejects most inputs, hegel will give up. Restructure generators to produce valid inputs directly (use .map() or dependent draws).
Creating a separate test file for hegel tests — Property-based tests belong alongside the existing tests for the same code. Add them to existing test files.
Using manually seeded RNGs — Use hegel's random generator so hegel controls the random decisions and can shrink them individually. See "Handling Randomness" above.
Overflowing in test code — When computing values from generated data (e.g., map.insert(k, k * 10)), your test code itself can overflow before the library has a chance to be buggy. Use wrapping arithmetic or draw a smaller type and widen it to prevent overflow in the test. Distinguish "this constraint protects the library's contract" (keep it) from "this constraint prevents my test from overflowing" (use wrapping arithmetic instead).
Restricting collection size for performance — If a test is slow with large collections, lower the test case count rather than restricting the input space. A slow test that finds bugs beats a fast test that can't. Many tree/trie bugs only manifest at 50-200+ elements.