From mims-harvard-tooluniverse
Retrieves proteomics datasets from MassIVE and ProteomeXchange by species, keyword, or accession. Returns metadata on instruments, publications, species, modifications, file counts. For mass spectrometry dataset discovery and lookup.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
Find and retrieve metadata for publicly available proteomics datasets from MassIVE and ProteomeXchange
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Find and retrieve metadata for publicly available proteomics datasets from MassIVE and ProteomeXchange repositories. Supports searching by species, keyword, or accession, and returns detailed dataset metadata including instruments, publications, species, and post-translational modifications.
Triggers:
Use Cases:
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Dataset quality depends on instrument, sample preparation, and quantification method. TMT/iTRAQ (isobaric labeling) datasets have ratio compression and co-isolation interference biases that differ from label-free quantification (LFQ). DIA datasets require different analysis pipelines than DDA. Check the original publication for methods before reusing data in a meta-analysis or cross-study comparison. Instrument resolution (Orbitrap > ion trap) and acquisition mode (DIA > DDA for completeness) directly affect how many proteins are quantified and at what confidence.
| Repository | Coverage | Strengths |
|---|---|---|
| MassIVE | 10,000+ datasets | Rich metadata (summaries, keywords, modifications, contacts), species filtering by taxonomy ID |
| ProteomeXchange | Aggregates PRIDE, MassIVE, PeptideAtlas, jPOST, iProX | Broadest coverage, standardized PXD accessions |
Query (keyword / species / accession)
|
+-- PHASE 0: Input Resolution
| Determine search type: keyword, species, or accession lookup
|
+-- PHASE 1: Repository Search
| Search MassIVE and/or ProteomeXchange based on query type
|
+-- PHASE 2: Dataset Detail Retrieval
| Get full metadata for promising hits
|
+-- PHASE 3: Result Synthesis
Compile datasets with metadata, publications, and relevance assessment
Objective: Determine the query type and prepare appropriate search parameters.
PXD000001, MSV000079514):
ProteomeXchange_get_dataset and optionally MassIVE_get_datasetMassIVE_get_datasetMassIVE_search_datasets with species filterProteomeXchange_search_datasets with query parameterObjective: Find relevant datasets across repositories.
MassIVE_search_datasets:
page_size: Number of results to return (integer, max 100, default 10)species: NCBI taxonomy ID string to filter by species (e.g., "9606" for human)accessions (array), title, summary, species, instruments, keywordsProteomeXchange_search_datasets:
query: Optional search filter -- keyword or dataset accession (e.g., "phosphoproteomics", "PXD")limit: Max results (1-50, default 10){data: [{accession, title, species}], metadata: {source, total_returned, query}}For species-specific search:
MassIVE_search_datasets(page_size=20, species="9606") for species-filtered resultsProteomeXchange_search_datasets(limit=20) for broader listingFor keyword search:
ProteomeXchange_search_datasets(query="keyword", limit=20)For comprehensive discovery:
{data: ...} wrapper){data: [...], metadata: {...}}Objective: Get full metadata for datasets of interest.
MassIVE_get_dataset:
accession: Dataset accession -- accepts both MSV and PXD formats (e.g., "MSV000079514", "PXD003971")accessions, title, summary, species, instruments, keywords, contacts, publications, modificationsProteomeXchange_get_dataset:
px_id: ProteomeXchange identifier in PXD format (e.g., "PXD000001"){data: {px_id, title, species, identifiers, instruments, publications, file_count}, metadata: {...}}ProteomeXchange_get_dataset for file count; use MassIVE_get_dataset for richer summary/keywordsObjective: Compile and present dataset results in a structured format.
# Proteomics Dataset Search Results
**Query**: [original query]
**Date**: YYYY-MM-DD
**Repositories searched**: MassIVE, ProteomeXchange
## Summary
Found N datasets matching [criteria].
## Datasets
### 1. [Title]
- **Accession**: PXD/MSV number
- **Species**: [organism]
- **Instruments**: [MS platforms]
- **Publications**: [PubMed IDs / DOIs]
- **Modifications**: [PTMs if available]
- **Files**: [count if available]
- **Summary**: [brief description]
### 2. [Title]
...
## Data Gaps
[Note any limitations in search coverage]
| Tool | Parameter | Notes |
|---|---|---|
MassIVE_search_datasets | page_size | Integer, max 100. Default 10 |
MassIVE_search_datasets | species | NCBI taxonomy ID as string (e.g., "9606" not 9606) |
MassIVE_get_dataset | accession | Accepts both MSV and PXD formats |
ProteomeXchange_search_datasets | query | Optional keyword or accession filter |
ProteomeXchange_search_datasets | limit | Integer, 1-50 |
ProteomeXchange_get_dataset | px_id | PXD format only (e.g., "PXD000001") |
Response Format Notes:
{data: [...], metadata: {...}}{data: {...}, metadata: {...}}| Situation | Fallback |
|---|---|
| MassIVE search returns empty | Use ProteomeXchange search (broader coverage) |
| ProteomeXchange search returns empty | Try broader/simpler query terms |
| MassIVE_get_dataset fails for PXD accession | Use ProteomeXchange_get_dataset instead |
| Species taxonomy ID unknown | Search ProteomeXchange by keyword (organism name) |
| No keyword search results | Try individual terms instead of multi-word queries |
| Species | Taxonomy ID |
|---|---|
| Human | 9606 |
| Mouse | 10090 |
| Rat | 10116 |
| Zebrafish | 7955 |
| Fruit fly | 7227 |
| C. elegans | 6239 |
| S. cerevisiae | 559292 |
| A. thaliana | 3702 |
| E. coli | 562 |
| Quality Indicator | Good | Acceptable | Caution |
|---|---|---|---|
| Instrument | Orbitrap Exploris/Eclipse, timsTOF | Q Exactive, TripleTOF 6600 | Older LTQ, ion trap only |
| Publication | Peer-reviewed with PubMed ID | Preprint or DOI only | No associated publication |
| Metadata completeness | Species + instrument + PTMs + summary | Species + instrument only | Title only, no annotations |
Interpreting dataset search results:
Synthesis questions to address in the report:
species parameterDataverse_get_datasetpage_size/limit reasonable| Skill | Relationship |
|---|---|
tooluniverse-proteomics-analysis | Use retrieved datasets as input for MS data analysis |
tooluniverse-protein-modification-analysis | Find PTM-specific datasets to complement iPTMnet annotations |
tooluniverse-multi-omics-integration | Discover proteomics datasets for cross-omics integration |