From mims-harvard-tooluniverse
Search and analyze cryo-EM maps, single particle structures, tomography datasets, and raw micrographs from EMDB, EMPIAR, CryoET. Cross-reference PDB structures and AlphaFold predictions for resolution-aware structural biology analysis.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
Pipeline for discovering and analyzing electron microscopy data across the full resolution spectrum: from 3D density maps (EMDB) to fitted atomic models (PDB), raw micrograph datasets (EMPIAR), and cryo-electron tomography volumes (CryoET Data Portal). Connects EM data to structural biology context via PDB and AlphaFold.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Pipeline for discovering and analyzing electron microscopy data across the full resolution spectrum: from 3D density maps (EMDB) to fitted atomic models (PDB), raw micrograph datasets (EMPIAR), and cryo-electron tomography volumes (CryoET Data Portal). Connects EM data to structural biology context via PDB and AlphaFold.
Guiding principles:
EM resolution determines what you can see. TEM resolves individual protein complexes (~2nm). Cryo-EM achieves near-atomic resolution (<4Å) for large complexes. SEM shows surface topology. Choose the right EM modality for the question.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Typical triggers:
Not this skill: For X-ray crystallography or NMR structures, use PDB search tools directly. For protein structure prediction, use tooluniverse-protein-structure.
| Database | Content | Best For |
|---|---|---|
| EMDB | 3D EM density maps (>40K entries) | Finding processed maps, resolution data, fitting info |
| EMPIAR | Raw micrograph/tilt series datasets | Accessing original image data for reprocessing |
| CryoET Data Portal | Cryo-electron tomography data | Tomographic volumes, cellular context, in-situ structures |
| PDB (RCSB) | Atomic models fitted to EM maps | Structural models derived from EM data |
| AlphaFold | AI-predicted protein structures | Complementary models when EM resolution is limited |
Phase 0: Query Parsing
Identify target protein/complex, method preference, resolution needs
|
Phase 1: Map & Image Search (EMDB)
Find EM density maps, resolution, method, sample details
|
Phase 2: Structure Fitting (EMDB + PDB)
Identify fitted atomic models, fitting quality
|
Phase 3: Raw Data Access (EMPIAR)
Find raw micrographs, tilt series, particle stacks
|
Phase 4: Tomography (CryoET Data Portal)
Search cryo-ET datasets, reconstructed volumes
|
Phase 5: Cross-Reference & Context (PDB + AlphaFold)
Connect to atomic models, predicted structures, literature
|
Phase 6: Report Synthesis
Integrated EM data landscape for the target
Identify from the user's request:
Objective: Find EM density maps matching the query.
Tools:
EMDB_search_structures -- search EMDB by keyword, organism, resolution
query (search term), optional resolution_min, resolution_max, method, limitEMDB_get_structure -- get full details for an EMDB entry
emdb_id (e.g., "EMD-1234")EMDB_get_map_info -- get map-specific info (resolution, contour, dimensions)
emdb_idEMDB_get_sample_info -- get sample preparation details
emdb_idWorkflow:
Resolution interpretation:
8.0A: shape; overall architecture only
Objective: Find atomic models fitted into EM maps and assess fitting quality.
Tools:
EMDB_get_validation -- get fitting/validation data for an EMDB entry
emdb_idRCSBData_get_entry -- get PDB entry details
entry_id (PDB ID)RCSBAdvSearch_search_structures -- advanced PDB search
query (search term), optional experimental_method, resolution_max, limitWorkflow:
Fitting quality indicators:
Objective: Locate raw micrograph data for potential reprocessing.
Tools:
EMPIAR_search_entries -- search EMPIAR archive
query (search term), optional limitEMPIAR_get_entry -- get detailed entry information
empiar_id (e.g., "EMPIAR-10028")Workflow:
Data types in EMPIAR:
Objective: Find cryo-electron tomography datasets for cellular and in-situ structural biology.
Tools:
CryoET_list_datasets -- search CryoET Data Portal
query (search term), optional organism, limitCryoET_get_dataset -- get dataset details
dataset_idCryoET_list_runs -- search individual tomography runs
dataset_id or query, optional limitWorkflow:
Tomography vs single particle: Tomography preserves cellular context (in situ) but typically achieves lower resolution. Single particle gives higher resolution but requires purified samples.
Objective: Connect EM data to broader structural biology context.
Tools:
alphafold_get_prediction -- get AlphaFold predicted structure
qualifier (UniProt accession)PubMed_search_articles -- find publications describing the EM work
query (search term), optional limitWorkflow:
Don't just list maps — help the user choose the RIGHT map for their purpose.
Decision matrix: Which map should I use?
| Purpose | Best Resolution | Method | Priority Criteria |
|---|---|---|---|
| Atomic model building | < 3.5A | Single particle | Highest resolution with fitted PDB model |
| Drug binding site analysis | < 3.0A | Single particle | Must resolve side chains in binding pocket |
| Domain architecture | 4-8A | Single particle or subtomogram avg | Large complexes where domains need fitting |
| Conformational states | < 4.5A | Single particle (multiple classes) | Look for entries with multiple maps from same dataset |
| Cellular context | 15-40A | Cryo-ET | Tomographic datasets showing in-situ arrangement |
| Reprocessing | Any | Any | Must have EMPIAR raw data; prefer recent datasets (better detectors) |
Quality assessment checklist:
Resolution trend analysis: If multiple maps exist over time, note the resolution trajectory. Improvement from 6A (2015) to 2.8A (2023) suggests the sample is amenable to high-resolution single particle analysis with modern hardware.
Assemble findings into an actionable report:
| Pattern | Description | Key Phases |
|---|---|---|
| Structure Discovery | Find all EM data for a protein | 0, 1, 2, 5 |
| Reprocessing Prep | Find raw data for re-analysis | 0, 1, 3 |
| Tomography Survey | Explore in-situ structural data | 0, 4 |
| Resolution Comparison | Track resolution improvements over time | 0, 1, 2 |
| Map-Model Validation | Assess quality of fitted atomic models | 0, 1, 2, 5 |
RCSBAdvSearch_search_structures with method filter