npx claudepluginhub mmaaz-git/agentic-pbtA plugin to find bugs in a codebase using property-based testing
Production-ready workflow orchestration with 79 focused plugins, 184 specialized agents, and 150 skills - optimized for granular installation and minimal token usage
Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.
Browser automation for AI agents
Get a coding agent to find bugs in your codebase by mining properties and testing them via Hypothesis.
For the artifacts from the paper, including bug reports and rankings, see the paper directory. Note that the code that was used in the paper is slightly behind what is in the main folder. See paper/README.md for more details.
To see all the bugs our agent found, see our website.
To read the blog post on the Hypothesis website, see here.
The agent is a Claude Code command. You will need to have Claude Code installed to run it. You will need a subscription to Claude Code, or an API key (we recommend an API key if you are running it over a large number of packages, or to reproduce the paper).
The command is contained in the hypo.md file. You will need to place this file in the .claude/commands/ directory, which can either be in ~ or in whichever directory you are running the agent from. The agent can then be invoked with /hypo <target>.
You will need pytest, hypothesis, and the package you are testing installed.
The agent takes one argument, which is the target to test. This can be a file, a function, or a module. If no argument is given, it will test the entire codebase, i.e., the current working directory. You can pass whichever other arguments that Claude Code supports, like the model, permissions, etc.
Example usage:
claude "/hypo numpy"
claude "/hypo statistics.median" --model opus
You can also just start Claude Code, and then invoke the agent.
The run.py script is a wrapper around the agent to test multiple packages, in parallel. It is what was used in the paper. This script does not require any other requirements beyond the standard library (of course, you still need to have Claude Code installed). You need python3 and pip to be in your PATH.
Note that the runner operates at the module level.
The only required argument is the path to a json file containing the packages to test, and which modules to test within each package. It looks like:
{
"pathlib": {
"type": "stdlib",
"modules": ["pathlib"]
},
"numpy": {
"type": "pypi",
"modules": ["numpy"]
}
}
The keys in the json file are the package names, either the standard library name or the PyPI name. For standard library packages, specify "stdlib", and for PyPI packages, specify "pypi". This is important so the runner knows how to set up the virtual environment.
The runner takes two optional arguments:
--max-workers: the number of parallel workers to use. Default is 20.--model: the model to use. Default is "opus".--preinstall-workers: the number of parallel workers to use for setting up the virtual environments. Default is 10.The runner will output all bug reports in the results/ directory.
Example usage:
python run.py packages.json
In the example_packages/ directory, there are some example package json files to test:
packages_mini.json: a mini set of modules to test (this took 6 minutes to run, with default settings)packages_10k.json: top 10,000 pypi packages, with the main module and all submodules one level deepThe packages tested in the paper are in the paper/ directory.
The runner sets up virtual environments, with venv, for each package. Standard library packages just use the same virtual environment, and PyPI packages get their own virtual environment. The runner will also install pytest and hypothesis in each virtual environment. It does this in parallel, which is controllable; see the CLI arguments below.
It then then sets up directories, up to a specified number of maximum workers (see CLI arguments below), which is a "sandbox" for the agent to run in. It only has permission to edit files within this sandbox. Each worker directory also contains .claude/commands/hypo.md, so that the agent can run. The runner parallelizes across modules.
Note that the runner also checks if the module has already been tested, and skips it if so. So, you can easily resume a run by just running the runner again.
The runner calls the agent with restricted permissions. It only has permission to read/write/edit files in the sandbox in which it is called, and it also has read permission to the virtual environment, so that it can read the source code of the package. Furthermore, it can only write/edit .py and .md files. The only bash commands it can run are python and pytest. Note that because of how the virtual environments are set up, the Python command will be python. Lastly, it also has access to the Todo and WebFetch tools.