From NVIDIA BioNeMo Agent Toolkit
Drives the full complexa design pipeline for protein binder, ligand binder, and AME motif scaffolding with flow matching, search, refold, and diversity analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bionemo-agent-toolkit:complexa-designThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Drive the full four-stage `complexa design` pipeline: generate (flow matching +
Drive the full four-stage complexa design pipeline: generate (flow matching +
search) -> filter (top-N by reward) -> evaluate (refold with AF2/RF3/ESMFold) ->
analyze (success rate, FoldSeek/MMseqs diversity). Pick the right pipeline
config for the design intent, validate the run upfront so the user does not
discover a missing ckpt mid-folding, run it, and emit a replayable manifest +
per-design success CSV.
result_type thresholds.Always run the shared preflight before launching a design — generation needs the GPU and the right checkpoint, evaluation needs AF2/RF3 weights and tool binaries. Bail early if the host cannot run the chosen pipeline.
bash .claude/skills/_shared/scripts/preflight.sh
Read ./complexa_setup/preflight.json and bail if any of these are missing for
the chosen pipeline:
gpu.available: false -> all pipelines fail.gpu.vram_gb < 40 -> generation OOMs at default batch_size: 16; lower to 8.ckpts.complexa[.ckpt] -> required for protein binder.ckpts.complexa_ligand[.ckpt] -> required for ligand binder.ckpts.complexa_ame[.ckpt] -> required for AME.env.AF2_DIR missing -> protein binder generation AND eval both fail. AF2
is the af2folding search reward (loaded at generation time), not just the
colabdesign refold backend — so a missing AF2_DIR crashes at sample 0,
before any structure is produced. This is why complexa download --complexa
alone is insufficient; you also need --all (which brings AF2).env.RF3_CKPT_PATH or env.RF3_EXEC_PATH missing -> ligand binder / AME
generation reward and default eval (rf3_latest) fail.If a ckpt is missing, point at complexa-setup and have the user run
complexa download --complexa-<variant> first.
Complexa has one default pipeline (protein binder) and two extensions
(ligand binder, AME / enzyme). The pipeline is selected entirely by the
configs/search_*_pipeline.yaml you pass to complexa design — each YAML
pins its own model checkpoint, autoencoder, targets dict, default reward, and
default refold backend. Switching pipelines is just "swap the config path and
the target name comes from a different dict".
complexa design configs/search_binder_local_pipeline.yaml \
++run_name=pdl1_v1 ++generation.task_name=02_PDL1
Use this when the user says "design a binder for X", "PDL1 binder",
"de novo binder", "design proteins for a target", etc. — i.e. the target is
a protein surface. The config pins complexa.ckpt + complexa_ae.ckpt, reads
targets from configs/targets/targets_dict.yaml, rewards with AF2
(af2folding), inverse-folds with SolubleMPNN, and evaluates with
ColabDesign / AF2. If the user did not specify, this is what they want.
complexa design configs/search_ligand_binder_local_pipeline.yaml \
++run_name=v11_v1 ++generation.task_name=39_7V11_LIGAND \
++metric.binder_folding_method=rf3_latest
Switch to this when the user says "ligand binder", "small-molecule pocket",
"SMILES target", "ATP-binding protein", or names a target ending in
_LIGAND / from the FAD / SAM / OQO / 7V11 / etc. families. The config
also activates LoRA (r=32, lora_alpha=64) which is required for the
released ligand checkpoint — leave the lora: block alone.
complexa design configs/search_ame_local_pipeline.yaml \
++run_name=ame_chm ++generation.task_name=M0096_1chm
Switch to this when the user says "scaffold a motif near a ligand",
"active-site design", "enzyme scaffolding", "AME", or names a target like
M0024_1nzy, M0096_1chm (the M####_<pdb> AME task naming). The config sets
env_vars.USE_V2_COMPLEXA_ARCH=True (the CLI runner injects it into the
subprocess) and uses both MotifFeatures and LigandFeatures. Default search
is single-pass; switch to best-of-n only if you also enable the
CompositeRewardModel (commented out in ame_generate.yaml).
| Knob | Protein binder (default) | Ligand binder | AME (enzyme) |
|---|---|---|---|
| Pipeline YAML | configs/search_binder_local_pipeline.yaml | configs/search_ligand_binder_local_pipeline.yaml | configs/search_ame_local_pipeline.yaml |
| Model ckpt | complexa.ckpt | complexa_ligand.ckpt | complexa_ame.ckpt |
| Autoencoder ckpt | complexa_ae.ckpt | complexa_ligand_ae.ckpt | complexa_ame_ae.ckpt |
| Targets dict | configs/targets/targets_dict.yaml | configs/targets/ligand_targets_dict.yaml | configs/design_tasks/ame_dict_v2.yaml |
| Task-name pattern | <NN>_<NAME> (e.g. 02_PDL1, 22_DerF21) | <NN>_<PDB>_LIGAND (e.g. 39_7V11_LIGAND) | M####_<pdb> (e.g. M0096_1chm) |
USE_V2_COMPLEXA_ARCH | (unset → v1) | (unset → v1) | "True" (set in YAML) |
| LoRA | (none) | required (r=32, alpha=64) | required |
| Default search algo | best-of-n | best-of-n | single-pass |
| Reward model | AF2 (af2folding) | RF3 (rf3folding) | null (no reward at default) |
| Inverse folder | soluble_mpnn | ligand_mpnn | ligand_mpnn |
| Default refold backend | colabdesign (AF2) | rf3_latest | rf3_latest |
Analysis result_type | protein_binder | ligand_binder | motif_ligand_binder |
Required ckpts (complexa download) | --complexa --all (AF2 in community) | --complexa-ligand --all (RF3 in community) | --complexa-ame --all (RF3 in community) |
For the full per-pipeline breakdown (reward weights, success thresholds, analysis modes), see reference/pipelines.md.
Use AskUserQuestion to fill in the four parameters that vary every run. Default to sensible production settings if the user has no preference.
Target name — must be a key in the relevant dict (targets_dict.yaml for
protein binder, ligand_targets_dict.yaml for ligand, ame_dict_v2.yaml for
AME). Confirm it exists before running — do not let the pipeline fail with
"Unknown target" after generation starts. Check membership directly:
complexa target show <name> # prints the entry, or errors if absent
# or grep the dict: rg '^\s*<name>:' configs/targets/targets_dict.yaml
If the target is not in the dict, hand off to complexa-target first, then
come back here — do not stall on "Unknown target". The clean chain is:
complexa target show EGFR errors).complexa-target skill: gather the PDB source, chain/residue
spec (or ligand SMILES), hotspots, and binder length, then register the entry.complexa target show <name> and complexa validate target <config> --target <name>.Run name — a short identifier appended to the output dir (e.g. pdl1_v1).
Search algorithm — default to beam-search with beam_width=8 and
n_branch=4 for production. Use single-pass for a quick smoke test.
Evaluation refold backend — protein binder defaults to colabdesign
(AF2); ligand/AME default to rf3_latest. Use esmfold for fast iteration
(worse but seconds per sample). ESMFold weights ship inside complexa download --all, but that sub-download needs HF_TOKEN set in .env or it silently
rate-limits and leaves no ESMFold dir — confirm community_models/ckpts/ESMFold
exists (complexa download --status) before relying on the esmfold backend.
Validate before running. This is cheap (seconds) and catches missing ckpts, missing reward/eval weights, and a missing target entry — any of which would otherwise abort the pipeline mid-evaluation after hours of generation.
complexa validate design does NOT accept ++ Hydra overrides. The
validate subcommand only takes <type> <config> [--target NAME]; passing
++generation.task_name=… makes it exit with "unrecognized arguments". It also
reads the config YAML as written — overrides you plan to pass to complexa design are not applied during validation. So validate the bare config, and use
--target (only available on validate target) to check a target other than
the config's own generation.task_name:
# Validates ckpts + reward/eval weights, and the target named *in the config*.
complexa validate design configs/search_binder_local_pipeline.yaml
# To check a specific target (e.g. one you will pass via ++generation.task_name),
# validate it explicitly — this path DOES check dict membership:
complexa validate target configs/search_binder_local_pipeline.yaml --target 02_PDL1
The validator returns non-zero on errors and prints a pass/fail report. Re-run until it returns clean.
What validate design does not cover (verify these yourself):
++metric.binder_folding_method=esmfold, validation still checks whatever the
config's default backend is. Confirm your overrides match the weights you
downloaded.++generation.task_name= — validate it
separately with validate target … --target <name> (above), or confirm it is
in the dict directly (complexa target show <name>).complexa design is the right tool for the full 4-stage run — it orchestrates
generate → filter → evaluate → analyze as sequential subprocesses with a
shared run name, log dir, and multi-GPU split (see run_design_pipeline in
src/proteinfoundation/cli/cli_runner.py). Re-implementing that manually
loses the per-stage log routing and progress prints.
Use ++ (forced) Hydra overrides; they apply to all stages. The minimal
production protein-binder invocation:
complexa design configs/search_binder_local_pipeline.yaml \
++run_name=pdl1_v1 \
++generation.task_name=02_PDL1 \
++generation.search.algorithm=beam-search \
++generation.search.beam_search.beam_width=8 \
++metric.binder_folding_method=colabdesign
For ligand binder / AME, swap the pipeline YAML and target name per Step 2's cheat sheet — every other override above is pipeline-agnostic and can be reused as-is.
Add --verbose to stream logs to the terminal instead of ./logs/. The skill
does not poll progress — the user re-invokes if they want a status; point them
at complexa status and ./logs/design_pipeline_*/.
To debug a single stage without pipeline orchestration, invoke the underlying
Hydra module directly. complexa generate CONFIG is just a logged subprocess
wrapper around this:
python -m proteinfoundation.generate \
--config-path "$(realpath configs)" \
--config-name search_binder_local_pipeline \
++run_name=debug_pdl1 \
++generation.task_name=02_PDL1
Same pattern for proteinfoundation.{filter,evaluate,analyze}. Use this when
you want to attach ipdb, run under nsys, or skip the pipeline log dir.
Prefer complexa generate/filter/evaluate/analyze for normal one-shot runs
(you get logging and parallel job splitting for free).
AME-specific gotcha (when running AME with RF3 refold): RF3 will try to
complete missing atoms on the ligand based on its CCD code, which produces
shape errors in RMSD calculations and the wrong structure. Before RF3 sees the
PDB, rename the ligand residue to L:0 so RF3 treats it as a generic ligand
and skips atom completion. If your AME generation outputs already encode the
ligand this way (the canonical Complexa pipeline does), no extra step is
needed; otherwise patch each PDB with atomworks.io:
from atomworks.io import load_any, to_pdb_file
atom_array = load_any("my_design.pdb")[0]
ligand_mask = atom_array.chain_id == "A"
atom_array.res_name[ligand_mask] = "L:0"
to_pdb_file(atom_array, "my_design_rf3_ready.pdb")
Same applies if you're piping AME outputs into the complexa-evaluate-pdbs
skill with an RF3 backend. Skip the rename for AF2/colabdesign refold or for
non-AME pipelines.
Wall-clock at default (nsteps=400, beam_width=8, batch_size=16, 100
designs, colabdesign eval) is ~30–120 minutes on a single A100/H100.
Outputs land in two directories. Surface both:
ls ./inference/${CONFIG_STEM}_${TASK}_*${RUN_NAME}/ # generated PDBs + filter
ls ./evaluation_results/${RUN_NAME}/ # per-design CSV + analysis
Read the combined results CSV and summarize:
ls ./evaluation_results/*/binder_results_*_combined.csv
ls ./evaluation_results/*/motif_binder_results_*_combined.csv # AME
ls ./evaluation_results/*/res_filter_*_pass_*.csv # success rate
ls ./evaluation_results/*/res_div_foldseek_*.csv # FoldSeek diversity
Pull the success rate from res_filter_binder_pass_*.csv, the per-design
metrics (interface pAE, pLDDT, scRMSD) from the combined CSV, and FoldSeek
TM-score diversity from res_div_foldseek_*.csv.
Empty diversity ≠ zero diversity. FoldSeek diversity is only computed when the
foldseekbinary is installed (FOLDSEEK_EXECin.env). If it is missing, the analysis step silently emits the diversity column with empty values — it looks like a computed result but is a tool-missing failure. Before reporting diversity, confirmtools.foldseek.exists: truein./complexa_setup/preflight.json. If it is false, tell the user diversity was not computed (install foldseek and re-runcomplexa analyze), rather than reporting "0 diversity".
Report top-N designs by i_pAE (protein binder) or min_ipAE (ligand binder).
Drop a JSON manifest beside the results so the run is replayable. The shared helper captures the command, config, git SHA, and pointers to the result CSVs.
python3 .claude/skills/_shared/scripts/write_manifest.py \
--output-dir ./evaluation_results/${RUN_NAME} \
--command "complexa design configs/search_binder_local_pipeline.yaml ++run_name=${RUN_NAME} ++generation.task_name=${TASK}" \
--skill complexa-design \
--out ./run_manifest.json
Surface the manifest path and the result CSV to the user.
The 10 overrides that cover ~90% of runs. Full reference (every key, type, default) is in reference/overrides.md.
| Override | Default | What it controls |
|---|---|---|
++generation.task_name=<name> | (per config) | Which target / motif task to design for |
++run_name=<str> | (config stem) | Output dir suffix and CSV tag |
++generation.search.algorithm=beam-search | best-of-n (binder/ligand), single-pass (AME) | Search strategy |
++generation.search.beam_search.beam_width=8 | 4 | Beam-search width (more = better designs, slower) |
++generation.args.nsteps=200 | 400 | Diffusion steps (fewer = faster, lower quality) |
++generation.dataloader.batch_size=8 | 16 | Drop to 8 on a 40GB GPU |
++generation.filter.filter_samples_limit=500 | 1000 | Top-N samples to keep after filtering |
++metric.binder_folding_method=esmfold | colabdesign (binder), rf3_latest (ligand/AME) | Evaluation refold backend |
++metric.num_redesign_seqs=8 | 2 | ProteinMPNN/LigandMPNN/SolubleMPNN sequences per design |
++aggregation.success_thresholds.i_pAE.threshold=10.0 | 7.0 (protein binder) | Loosen / tighten success criteria |
| Resource | Minimum | Recommended |
|---|---|---|
| GPU | 1x CUDA GPU, 40 GB VRAM | A100 / H100 / L40S, 80 GB VRAM |
| CPUs | 16 | 24 (the ncpus_ default in every pipeline config) |
| Disk | 50 GB at ./inference/ + ./evaluation_results/ | 200 GB for sweep runs |
| RAM | 32 GB | 64 GB+ |
Typical wall-clock for 100 designs, beam_width=8, default nsteps=400:
Bumping gen_njobs=2 and eval_njobs=2 halves wall-clock on a 2-GPU host. See
.claude/skills/_shared/reference/hardware.md for per-pipeline VRAM tables.
| Symptom | Cause | Fix |
|---|---|---|
CUDA out of memory in generate | batch_size: 16 too big on 40GB GPU | ++generation.dataloader.batch_size=8 |
CUDA out of memory in evaluate | AF2 / RF3 batched too aggressively | ++eval_njobs=1 and ++metric.num_redesign_seqs=2 |
InterpolationKeyError: AF2_DIR | colabdesign eval but .env does not set AF2_DIR | Set AF2_DIR in .env or ++metric.binder_folding_method=esmfold |
InterpolationKeyError: RF3_CKPT_PATH | RF3 eval but RF3 not installed | complexa download --all or switch eval backend |
KeyError: 'task_name' not in target_dict_cfg | Target not in targets_dict.yaml / ligand_targets_dict.yaml / ame_dict_v2.yaml | Use complexa-target skill to add it |
| 0 designs pass success thresholds | Defaults too strict for this target | Loosen via ++aggregation.success_thresholds.* |
For the full list (chain-ID mismatches, hotspot residues, ligand residue renaming for RF3, missing inverse-folding models, etc.) see reference/troubleshooting.md.
For per-pipeline details (which model, reward, inverse folding, evaluation
backend, result_type, LoRA settings, USE_V2_COMPLEXA_ARCH toggle), see
reference/pipelines.md.
For the full override reference (every generation.*, metric.*,
aggregation.* key with type, default, example, and effect), see
reference/overrides.md.
For all troubleshooting cases (cause + fix + source), see reference/troubleshooting.md.
npx claudepluginhub nvidia-bionemo/bionemo-agent-toolkit --plugin bionemo-agent-toolkitEvaluates a directory of PDB files with Proteina-Complexa: refolds designs, computes interface metrics (i_pAE, i_pTM, scRMSD), and reports pass-rate summaries. Supports AF2, RF3, ESMFold, and Boltz2 backends.
Designs therapeutic proteins using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation with ESMFold/AlphaFold2. Useful for protein binders, scaffolds, enzyme variants, and miniprotein design.
Submits protein sequences via Python SDK for cell-free expression (10–100 µg) and binding affinity (KD) assays against targets. Tracks status and retrieves results for ML-guided directed evolution and antibody/nanobody optimization.