Marketplace

agentic-pbt

npx claudepluginhub mmaaz-git/agentic-pbt

README

View full README on GitHub

1 Plugin

hypo-plugin

62·

A plugin to find bugs in a codebase using property-based testing

7mo

v1.0.0

mmaaz-git

Related Marketplaces

antigravity-awesome-skills

39.8K

38plugins

Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.

claude-code-workflows

36.4K

22plugins

Production-ready workflow orchestration with 84 marketplace plugins, 192 local specialized agents, and 156 local skills - optimized for granular installation and minimal token usage

claude-plugins-official

29.4K

187plugins

Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations

Stats

Plugins1

Stars62

UpdatedJan 1, 2026

Links

View on GitHub View Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

Stats

Links

Help us improve

Share bugs, ideas, or general feedback.

Agentic Property-Based Testing

Get a coding agent to find bugs in your codebase by mining properties and testing them via Hypothesis.

For the artifacts from the paper, including bug reports and rankings, see the paper directory. Note that the code that was used in the paper is slightly behind what is in the main folder. See paper/README.md for more details.

To see all the bugs our agent found, see our website.

To read the blog post on the Hypothesis website, see here.

Running the agent

The agent is a Claude Code command. You will need to have Claude Code installed to run it. You will need a subscription to Claude Code, or an API key (we recommend an API key if you are running it over a large number of packages, or to reproduce the paper).

The command is contained in the hypo.md file. You will need to place this file in the .claude/commands/ directory, which can either be in ~ or in whichever directory you are running the agent from. The agent can then be invoked with /hypo <target>.

You will need pytest, hypothesis, and the package you are testing installed.

The agent takes one argument, which is the target to test. This can be a file, a function, or a module. If no argument is given, it will test the entire codebase, i.e., the current working directory. You can pass whichever other arguments that Claude Code supports, like the model, permissions, etc.

Example usage:

claude "/hypo numpy"
claude "/hypo statistics.median" --model opus

You can also just start Claude Code, and then invoke the agent.

Agent runner

The run.py script is a wrapper around the agent to test multiple packages, in parallel. It is what was used in the paper. This script does not require any other requirements beyond the standard library (of course, you still need to have Claude Code installed). You need python3 and pip to be in your PATH.

Note that the runner operates at the module level.

The only required argument is the path to a json file containing the packages to test, and which modules to test within each package. It looks like:

{
    "pathlib": {
        "type": "stdlib",
        "modules": ["pathlib"]
    },
    "numpy": {
        "type": "pypi",
        "modules": ["numpy"]
    }
}

The keys in the json file are the package names, either the standard library name or the PyPI name. For standard library packages, specify "stdlib", and for PyPI packages, specify "pypi". This is important so the runner knows how to set up the virtual environment.

The runner takes two optional arguments:

--max-workers: the number of parallel workers to use. Default is 20.
--model: the model to use. Default is "opus".
--preinstall-workers: the number of parallel workers to use for setting up the virtual environments. Default is 10.

The runner will output all bug reports in the results/ directory.

Example usage:

python run.py packages.json

In the example_packages/ directory, there are some example package json files to test:

packages_mini.json: a mini set of modules to test (this took 6 minutes to run, with default settings)
packages_10k.json: top 10,000 pypi packages, with the main module and all submodules one level deep

The packages tested in the paper are in the paper/ directory.

How the agent works

The runner sets up virtual environments, with venv, for each package. Standard library packages just use the same virtual environment, and PyPI packages get their own virtual environment. The runner will also install pytest and hypothesis in each virtual environment. It does this in parallel, which is controllable; see the CLI arguments below.

It then then sets up directories, up to a specified number of maximum workers (see CLI arguments below), which is a "sandbox" for the agent to run in. It only has permission to edit files within this sandbox. Each worker directory also contains .claude/commands/hypo.md, so that the agent can run. The runner parallelizes across modules.

Note that the runner also checks if the module has already been tested, and skips it if so. So, you can easily resume a run by just running the runner again.

Security

The runner calls the agent with restricted permissions. It only has permission to read/write/edit files in the sandbox in which it is called, and it also has read permission to the virtual environment, so that it can read the source code of the package. Furthermore, it can only write/edit .py and .md files. The only bash commands it can run are python and pytest. Note that because of how the virtual environments are set up, the Python command will be python. Lastly, it also has access to the Todo and WebFetch tools.

agentic-pbt

README

1 Plugin

hypo-plugin

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

Help us improve

Help us improve

Find plugins for your project

agentic-pbt

README

Agentic Property-Based Testing

Running the agent

Agent runner

How the agent works

Security

1 Plugin

hypo-plugin

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

Help us improve

Agentic Property-Based Testing

Running the agent

Agent runner

How the agent works

Security