From mims-harvard-tooluniverse
Designs novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design with RFdiffusion for backbones, ProteinMPNN for sequences, and ESMFold/AlphaFold2 validation. For protein binder design or function engineering.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
Therapeutic protein design starts with the target interaction. What binding surface do you need to cover? A small pocket = nanobody or peptide. A large flat surface = designed protein. Stability, immunogenicity, and manufacturability constrain the design space.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when user asks to:
Phase 1: Target Characterization
Get structure (PDB, EMDB cryo-EM, AlphaFold), identify binding epitope
Phase 2: Backbone Generation (RFdiffusion)
Define constraints, generate >= 5 backbones, filter by geometry
Phase 3: Sequence Design (ProteinMPNN)
Design >= 8 sequences per backbone, sample with temperature control
Phase 4: Structure Validation (ESMFold/AlphaFold2)
Predict structure, compare to backbone, assess pLDDT/pTM
Phase 5: Developability Assessment
Aggregation, pI, expression prediction
Phase 6: Report Synthesis
Ranked candidates, FASTA, experimental recommendations
[TARGET]_protein_design_report.md first with section headers[TARGET]_designed_sequences.fasta and [TARGET]_top_candidates.csvEvery design MUST include: Sequence, Length, Target, Method, and Quality Metrics (pLDDT, pTM, MPNN score, binding prediction).
| Tool | Purpose | Key Parameter |
|---|---|---|
NvidiaNIM_rfdiffusion | Backbone generation | diffusion_steps (NOT num_steps) |
NvidiaNIM_proteinmpnn | Sequence design | pdb_string (NOT pdb) |
ESMFold_predict_structure | Fast validation | sequence (NOT seq) |
NvidiaNIM_alphafold2 | High-accuracy validation | sequence, algorithm |
NvidiaNIM_esm2_650m | Sequence embeddings | sequences, format |
| Tool | Wrong | Correct |
|---|---|---|
NvidiaNIM_rfdiffusion | num_steps=50 | diffusion_steps=50 |
NvidiaNIM_proteinmpnn | pdb=content | pdb_string=content |
ESMFold_predict_structure | seq="MVLS..." | sequence="MVLS..." |
NvidiaNIM_alphafold2 | seq="MVLS..." | sequence="MVLS..." |
NVIDIA_API_KEY environment variable required| Tool | Purpose | Key Parameters |
|---|---|---|
PDBe_get_uniprot_mappings | Find PDB structures | uniprot_id |
RCSBData_get_entry | Download PDB file | pdb_id |
alphafold_get_prediction | Get AlphaFold DB structure | accession |
emdb_search | Search cryo-EM maps | query |
emdb_get_entry | Get entry details | entry_id |
UniProt_get_entry_by_accession | Get target sequence | accession |
InterPro_get_protein_domains | Get domains | accession |
| Tier | Criteria |
|---|---|
| T1 (best) | pLDDT >85, pTM >0.8, low aggregation, neutral pI |
| T2 | pLDDT >75, pTM >0.7, acceptable developability |
| T3 | pLDDT >70, pTM >0.65, developability concerns |
| T4 | Failed validation or major developability issues |