From NVIDIA BioNeMo Agent Toolkit
Evaluates a directory of PDB files with Proteina-Complexa: refolds designs, computes interface metrics (i_pAE, i_pTM, scRMSD), and reports pass-rate summaries. Supports AF2, RF3, ESMFold, and Boltz2 backends.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bionemo-agent-toolkit:complexa-evaluate-pdbsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Score a directory of pre-existing PDB files against the same metrics Proteina-Complexa uses internally. Wraps `complexa analysis <evaluate_config> ++sample_storage_path=<dir>`: the CLI runs the `evaluate` step (refold + interface metrics + monomer metrics) and then the `analyze` step (success thresholds, diversity, pass-rate CSVs). Do **not** run `complexa generate` here — the inputs already ex...
Score a directory of pre-existing PDB files against the same metrics Proteina-Complexa uses internally. Wraps complexa analysis <evaluate_config> ++sample_storage_path=<dir>: the CLI runs the evaluate step (refold + interface metrics + monomer metrics) and then the analyze step (success thresholds, diversity, pass-rate CSVs). Do not run complexa generate here — the inputs already exist.
colabdesign), RF3 (rf3_latest), ESMFold (esmfold), or Boltz2 (boltz2_default).i_pAE, min_ipAE, i_pTM, pLDDT, binder/complex scRMSD.result_type.Always check GPU / disk / tool binaries before launching a refold job. RF3 and ColabDesign-AF2 are large.
bash .claude/skills/_shared/scripts/preflight.sh
Surface from preflight.json:
gpu.available and gpu.vram_gb — colabdesign/RF3 need ≥40 GB; ESMFold tolerates ≥24 GB.env.missing_required — must include the keys for the chosen folding backend:
colabdesign → AF2_DIRrf3_latest → RF3_CKPT_PATH, RF3_EXEC_PATHesmfold → ESMFold weights resolvabletools.{foldseek,mmseqs} — required by aggregation.compute_diversity / compute_mmseqs_diversity (both default true).If any required key is missing, route the user to the complexa-setup skill.
Like complexa design, evaluation has one default flow (protein binder) and two extensions (ligand binder, AME). The evaluate config you pass to complexa analysis decides everything else (which metrics, which refolder defaults, which thresholds the analyze step applies).
complexa analysis configs/evaluate_from_pdb_dir.yaml \
++sample_storage_path=/abs/path/to/pdbs \
++dataset.task_name=02_PDL1 \
++result_type=protein_binder \
++metric.binder_folding_method=colabdesign \
++metric.inverse_folding_model=soluble_mpnn \
++run_name=eval_pdl1_af2
Use this when the user's PDBs are protein-binder designs (multi-chain, binder is the last chain) or third-party outputs from BindCraft / AlphaProteo / RFdiffusion. Pulls thresholds for protein_binder (i_pAE * 31 ≤ 7.0, pLDDT ≥ 0.9, scRMSD_ca < 1.5 Å).
| Design type | Use the protein-binder default? | Evaluate config | Analyze config | result_type | Default backend |
|---|---|---|---|---|---|
| Protein binder | Yes (default) | configs/evaluate_from_pdb_dir.yaml | configs/analyze.yaml | protein_binder | colabdesign (AF2) |
| Ligand binder (binder + small-molecule) | Same evaluate config, swap 3 overrides | configs/evaluate_from_pdb_dir.yaml | configs/analyze.yaml | ligand_binder | rf3_latest |
| AME / motif + ligand (enzyme outputs) | No — needs motif-aware config | configs/evaluate_ame_from_pdb_dir.yaml | configs/analyze_motif_binder.yaml | motif_ligand_binder | rf3_latest |
Extending to ligand binder (same evaluate config as default, three override swaps):
complexa analysis configs/evaluate_from_pdb_dir.yaml \
++sample_storage_path=/abs/path/to/pdbs \
++dataset.task_name=39_7V11_LIGAND \
++result_type=ligand_binder \
++metric.binder_folding_method=rf3_latest \
++metric.inverse_folding_model=ligand_mpnn \
++run_name=eval_v11_rf3
Extending to AME (different config; ligand auto-completion gotcha — see Step 4):
complexa analysis configs/evaluate_ame_from_pdb_dir.yaml \
++sample_storage_path=/abs/path/to/pdbs \
++dataset.task_name=M0096_1chm \
++run_name=eval_ame_chm
See reference/eval_configs.md for the full matrix (every result_type, every threshold default, every supported folding backend).
Ask in one batched AskUserQuestion:
pdb_dir — absolute path to the directory of PDBs to evaluate.colabdesign (AF2, protein binders), rf3_latest (ligand / AME), esmfold (fast iteration), boltz2_default (alternative).configs/targets/targets_dict.yaml, configs/targets/ligand_targets_dict.yaml, or configs/design_tasks/ame_dict_v2.yaml. Required to identify the target reference (and, for AME, the motif contigs).L:0 in every PDB (see Troubleshooting). If not, do that rename first.Prefer complexa analysis (the evaluate→analyze chain) — it reuses the same config for both steps and writes a single log dir.
# Protein binder PDB dir, AF2 refold
complexa analysis configs/evaluate_from_pdb_dir.yaml \
++sample_storage_path=/abs/path/to/pdbs \
++dataset.task_name=02_PDL1 \
++metric.binder_folding_method=colabdesign \
++metric.inverse_folding_model=soluble_mpnn \
++result_type=protein_binder \
++run_name=eval_pdl1_af2
For ligand binders flip binder_folding_method=rf3_latest, inverse_folding_model=ligand_mpnn, result_type=ligand_binder. For AME use configs/evaluate_ame_from_pdb_dir.yaml — see reference/eval_configs.md for full worked examples.
If you need to inspect output between stages, run them separately. The configs above are shared between evaluate and analyze:
complexa evaluate configs/evaluate_from_pdb_dir.yaml ++sample_storage_path=/abs/path/to/pdbs ++run_name=eval_pdl1_af2
complexa analyze configs/evaluate_from_pdb_dir.yaml ++run_name=eval_pdl1_af2
Dry-run first if the user is unsure (no GPU work happens; the planned file walk + invocation prints):
complexa analysis configs/evaluate_from_pdb_dir.yaml ++sample_storage_path=/abs/path/to/pdbs ++dryrun=true
complexa evaluate / analyze are subprocess wrappers around the Hydra
modules with logging + parallel job splitting bolted on. To attach a debugger
or run under a profiler, invoke the module directly:
python -m proteinfoundation.evaluate \
--config-path "$(realpath configs)" \
--config-name evaluate_from_pdb_dir \
++sample_storage_path=/abs/path/to/pdbs \
++dataset.task_name=02_PDL1 \
++metric.binder_folding_method=colabdesign \
++run_name=eval_debug
For normal one-shot runs prefer complexa analysis — you get the shared log
dir and a single replayable invocation, instead of having to thread the same
overrides through two python -m calls.
Output lands under ./evaluation_results/${run_name}/:
*_results_*.csv (one row per input PDB × sequence_types).analyze step (e.g. res_designability.csv, res_filter_ligand_pass_*.csv, success_criteria_*.json).aggregation.compute_diversity=true (default). Empty diversity values ≠ low diversity: if foldseek/mmseqs are not installed, the analyze step emits the diversity column with empty values rather than erroring. Before reporting diversity, confirm tools.foldseek.exists: true (and tools.mmseqs.exists) in ./complexa_setup/preflight.json; if false, tell the user diversity was not computed (install the tool and re-run) instead of reporting a number.Summarize to the user:
result_type (e.g. for protein_binder: i_pAE*31 <= 7.0 AND pLDDT >= 0.9 AND scRMSD_ca < 1.5).i_pAE for protein, min_ipAE for ligand, motif_rmsd_pred_all for AME).Capture the resolved invocation + outputs for replay.
python3 .claude/skills/_shared/scripts/write_manifest.py \
--output-dir ./evaluation_results/${run_name} \
--command "complexa analysis configs/evaluate_from_pdb_dir.yaml ++sample_storage_path=<dir> ++dataset.task_name=<task> ++metric.binder_folding_method=<backend> ++run_name=<run>" \
--skill complexa-evaluate-pdbs \
--out ./eval_manifest.json
The manifest pins: resolved config, git SHA, ckpt SHA-256s, the result CSV paths, and the user-stated result_type for replay.
| Override | Effect |
|---|---|
++sample_storage_path=<dir> | The directory of PDBs to evaluate (required). |
++dataset.task_name=<name> | Target / AME task name. Resolves target PDB + (for AME) motif contigs. |
++metric.binder_folding_method=<backend> | colabdesign / rf3_latest / esmfold / boltz2_default. |
++metric.inverse_folding_model=<model> | protein_mpnn / soluble_mpnn / ligand_mpnn. |
++metric.sequence_types=[self,mpnn,mpnn_fixed] | Which sequence flavors to refold. |
++metric.num_redesign_seqs=N | ProteinMPNN/LigandMPNN redesign count. |
++metric.compute_pre_refolding_metrics=true | Add bioinformatics/TMOL/HBPLUS metrics on the input structures. |
++metric.keep_folding_outputs=true | Save the refolded PDBs (large, but useful for inspection). |
++result_type=<type> | Override default thresholds: protein_binder / ligand_binder / motif_ligand_binder. |
++aggregation.success_thresholds.<…> | Tighten or loosen specific thresholds (see reference/eval_configs.md). |
++eval_njobs=N | Parallel GPUs for the evaluate step. |
++dryrun=true | Plan without running any folding. |
++file_limit=N | Cap input PDBs (handy for first-pass smoke tests). |
colabdesign) and RF3 (rf3_latest) need ≥40 GB VRAM (A100/H100/L40S). ESMFold runs on ≥24 GB. Multi-GPU via ++eval_njobs=N.ncpus_: 24). Each refolded PDB + intermediate output is ~1–5 MB; keep_folding_outputs=true can balloon to tens of GB for thousands of inputs._shared/reference/hardware.md for per-backend wall-clock and VRAM tables.Error: Config file not found — paths are relative to the repo root; cd to the repo before invoking complexa analysis.compute_motif_binder_metrics=True but result_type=protein_binder — result_type and the underlying compute_*_metrics must agree. Use evaluate_ame_from_pdb_dir.yaml for AME inputs rather than mutating evaluate_from_pdb_dir.yaml.L:0 in every input PDB before evaluation; see the snippet in README.md (atom_array.res_name[ligand_mask] = "L:0").FOLDSEEK_EXEC/MMSEQS_EXEC not installed or not on PATH. The analyze step does not hard-fail; it just leaves the diversity values blank, which reads like a result. Confirm with preflight.sh (tools.foldseek.exists), then either fix .env (preferred) or explicitly disable so the absence is intentional: ++aggregation.compute_diversity=false ++aggregation.compute_mmseqs_diversity=false.binder_folding_method matches the target type (RF3 for ligand, AF2 for protein) and that ++dataset.task_name resolves to the correct reference PDB (complexa target show <name> to verify).Full evaluate/analyze config matrix, every supported result_type, per-threshold defaults, and worked examples (protein binder / ligand binder / AME): see reference/eval_configs.md.
npx claudepluginhub nvidia-bionemo/bionemo-agent-toolkit --plugin bionemo-agent-toolkitDrives the full complexa design pipeline for protein binder, ligand binder, and AME motif scaffolding with flow matching, search, refold, and diversity analysis.
Predicts protein-ligand binding poses using DiffDock diffusion-based molecular docking. Processes PDB structures and SMILES inputs for virtual screening and structure-based drug design. Does not predict binding affinity.
Accesses AlphaFold DB's 200M+ predicted protein structures by UniProt ID using BioPython or REST API. Downloads PDB/mmCIF files, analyzes pLDDT/PAE confidence, bulk-fetches proteomes via Google Cloud.