Skill

tooluniverse-protein-therapeutic-design

Designs novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design with RFdiffusion for backbones, ProteinMPNN for sequences, and ESMFold/AlphaFold2 validation. For protein binder design or function engineering.

Python

ai-ml

Install

npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverse

Tool Access

This skill uses the workspace's default tool permissions.

Preview

AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.

Supporting Assets

CHECKLIST.mdDESIGN_PROCEDURES.mdEXAMPLES.mdTOOLS_REFERENCE.mddesign_templates.md

SKILL.md

Similar Skills

github-deep-research

2 files

Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.

bytedance-deer-flow-1

63.9k

surprise-me

Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.

bytedance-deer-flow-1

63.9k

image-generation

2 files

Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.

bytedance-deer-flow-1

63.9k

Stats

Stars1291

Forks199

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Therapeutic Protein Designer

AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.

KEY PRINCIPLES:

Structure-first - Generate backbone geometry before sequence
Target-guided - Design binders with target structure in mind
Iterative validation - Predict structure to validate designs
Developability-aware - Consider aggregation, immunogenicity, expression
Evidence-graded - Grade designs by confidence metrics
Actionable output - Provide sequences ready for experimental testing
English-first queries - Always use English terms in tool calls

Therapeutic protein design starts with the target interaction. What binding surface do you need to cover? A small pocket = nanobody or peptide. A large flat surface = designed protein. Stability, immunogenicity, and manufacturability constrain the design space.

LOOK UP, DON'T GUESS

When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

When to Use

Apply when user asks to:

Design a protein binder, therapeutic protein, or scaffold
Optimize a protein sequence for function
Design a de novo enzyme
Generate protein variants for target binding

Workflow Overview

Phase 1: Target Characterization
  Get structure (PDB, EMDB cryo-EM, AlphaFold), identify binding epitope

Phase 2: Backbone Generation (RFdiffusion)
  Define constraints, generate >= 5 backbones, filter by geometry

Phase 3: Sequence Design (ProteinMPNN)
  Design >= 8 sequences per backbone, sample with temperature control

Phase 4: Structure Validation (ESMFold/AlphaFold2)
  Predict structure, compare to backbone, assess pLDDT/pTM

Phase 5: Developability Assessment
  Aggregation, pI, expression prediction

Phase 6: Report Synthesis
  Ranked candidates, FASTA, experimental recommendations

Critical Requirements

Report-First Approach (MANDATORY)

Create [TARGET]_protein_design_report.md first with section headers
Progressively update as designs are generated
Output [TARGET]_designed_sequences.fasta and [TARGET]_top_candidates.csv

Design Documentation (MANDATORY)

Every design MUST include: Sequence, Length, Target, Method, and Quality Metrics (pLDDT, pTM, MPNN score, binding prediction).

NVIDIA NIM Tools

Tool	Purpose	Key Parameter
`NvidiaNIM_rfdiffusion`	Backbone generation	`diffusion_steps` (NOT `num_steps`)
`NvidiaNIM_proteinmpnn`	Sequence design	`pdb_string` (NOT `pdb`)
`ESMFold_predict_structure`	Fast validation	`sequence` (NOT `seq`)
`NvidiaNIM_alphafold2`	High-accuracy validation	`sequence`, `algorithm`
`NvidiaNIM_esm2_650m`	Sequence embeddings	`sequences`, `format`

Common Parameter Mistakes

Tool	Wrong	Correct
`NvidiaNIM_rfdiffusion`	`num_steps=50`	`diffusion_steps=50`
`NvidiaNIM_proteinmpnn`	`pdb=content`	`pdb_string=content`
`ESMFold_predict_structure`	`seq="MVLS..."`	`sequence="MVLS..."`
`NvidiaNIM_alphafold2`	`seq="MVLS..."`	`sequence="MVLS..."`

NVIDIA NIM Requirements

API Key: NVIDIA_API_KEY environment variable required
Rate limits: 40 RPM (1.5 second minimum between calls)
AlphaFold2 may return 202 (polling required); RFdiffusion and ESMFold are synchronous

Supporting Tools

Tool	Purpose	Key Parameters
`PDBe_get_uniprot_mappings`	Find PDB structures	`uniprot_id`
`RCSBData_get_entry`	Download PDB file	`pdb_id`
`alphafold_get_prediction`	Get AlphaFold DB structure	`accession`
`emdb_search`	Search cryo-EM maps	`query`
`emdb_get_entry`	Get entry details	`entry_id`
`UniProt_get_entry_by_accession`	Get target sequence	`accession`
`InterPro_get_protein_domains`	Get domains	`accession`

Evidence Grading

Tier	Criteria
T1 (best)	pLDDT >85, pTM >0.8, low aggregation, neutral pI
T2	pLDDT >75, pTM >0.7, acceptable developability
T3	pLDDT >70, pTM >0.65, developability concerns
T4	Failed validation or major developability issues

Completeness Checklist

Target structure obtained (PDB or predicted)
Binding epitope identified
>= 5 backbones generated, top 3-5 selected
>= 8 sequences per backbone, MPNN scores reported
All sequences validated (ESMFold), pLDDT/pTM reported, >= 3 passing
Developability assessed (aggregation, pI, expression)
Ranked candidate list, FASTA file, experimental recommendations

Reference Files

DESIGN_PROCEDURES.md - Phase-by-phase code examples, sampling parameters, fallback chains
TOOLS_REFERENCE.md - Complete tool documentation with code examples
EXAMPLES.md - Sample design workflows and outputs
CHECKLIST.md - Detailed phase checklists and quality metrics
design_templates.md - Report templates and output format examples