Skill

tooluniverse-multiomic-disease-characterization

Characterizes diseases via multi-omics integration of genomics, transcriptomics, proteomics, pathways, and therapeutics. Produces scored reports (0-100), concordance analysis, biomarkers, therapeutic opportunities, and hypotheses. For disease mechanism and target queries.

Python

Bash

ai-ml

data-engineering

Install

npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverse

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.

Supporting Assets

integration-scoring.mdreport-template.mdresponse-formats.mdtest_multiomic_disease.pytool-reference.mduse-patterns.md

SKILL.md

Similar Skills

github-deep-research

2 files

Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.

bytedance-deer-flow-1

63.9k

surprise-me

Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.

bytedance-deer-flow-1

63.9k

image-generation

2 files

Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.

bytedance-deer-flow-1

63.9k

Stats

Stars1291

Forks199

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Multi-Omics Disease Characterization Pipeline

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, then populate progressively
Disease disambiguation FIRST - Resolve all identifiers before omics analysis
Layer-by-layer analysis - Systematically cover all omics layers
Cross-layer integration - Identify genes/targets appearing in multiple layers
Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
Tissue context - Emphasize disease-relevant tissues/organs
Quantitative scoring - Multi-Omics Confidence Score (0-100)
Druggable focus - Prioritize targets with therapeutic potential
Biomarker identification - Highlight diagnostic/prognostic markers
Mechanistic synthesis - Generate testable hypotheses
Source references - Every statement must cite tool/database
Completeness checklist - Mandatory section showing analysis coverage
English-first queries - Always use English terms in tool calls. Respond in user's language

Multi-omics disease characterization asks: what molecular layers are dysregulated? Genomic mutations → transcriptomic changes → proteomic effects → metabolomic consequences. Concordance across layers strengthens the finding. Discordance reveals regulatory complexity.

LOOK UP, DON'T GUESS

When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

When to Use This Skill

Apply when users:

Ask about disease mechanisms across omics layers
Need multi-omics characterization of a disease
Want to understand disease at the systems biology level
Ask "What pathways/genes/proteins are involved in [disease]?"
Need biomarker discovery for a disease
Want to identify druggable targets from disease profiling
Ask for integrated genomics + transcriptomics + proteomics analysis
Need cross-layer concordance analysis
Ask about disease network biology / hub genes

NOT for (use other skills instead):

Single gene/target validation -> Use tooluniverse-drug-target-validation
Drug safety profiling -> Use tooluniverse-adverse-event-detection
General disease overview -> Use tooluniverse-disease-research
Variant interpretation -> Use tooluniverse-variant-interpretation
GWAS-specific analysis -> Use tooluniverse-gwas-* skills
Pathway-only analysis -> Use tooluniverse-systems-biology

Input Parameters

Parameter	Required	Description	Example
disease	Yes	Disease name, OMIM ID, EFO ID, or MONDO ID	`Alzheimer disease`, `MONDO_0004975`
tissue	No	Tissue/organ of interest	`brain`, `liver`, `blood`
focus_layers	No	Specific omics layers to emphasize	`genomics`, `transcriptomics`, `pathways`

Pipeline Overview

The pipeline runs 9 phases sequentially. Each phase uses specific tools documented in detail in tool-reference.md.

Phase 0: Disease Disambiguation (ALWAYS FIRST)

Resolve disease to standard identifiers (MONDO/EFO) for all downstream queries.

Primary tool: OpenTargets_get_disease_id_description_by_name
Get description, synonyms, therapeutic areas, disease hierarchy, cross-references
CRITICAL: Disease IDs use underscore format (e.g., MONDO_0004975), NOT colon
If ambiguous, present top 3-5 options and ask user to select

Phase 1: Genomics Layer

Identify genetic variants, GWAS associations, and genetically implicated genes.

Tools: gwas_search_associations (use efo_id for precision, not free-text disease_trait), gwas_get_snps_for_gene, ClinVar, OpenTargets associated targets
gnomad_get_gene_constraints — gene constraint metrics (pLI, oe_lof) to interpret whether LoF variants are tolerated vs. haploinsufficient
Get top 10-15 genes with genetic evidence scores; track Ensembl IDs for downstream phases

Phase 2: Transcriptomics Layer

Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.

GTEx_get_expression_summary — baseline expression across 54 tissues (accepts gene_symbol directly)
Tools: Expression Atlas, HPA (tissue expression), EuropePMC scores
Check expression in disease-relevant tissues for top genes from Phase 1

Phase 3: Proteomics & Interaction Layer

Map protein-protein interactions, identify hub genes, and characterize interaction networks.

UniProt_get_function_by_accession — protein function narrative (essential for mechanistic context)
Tools: STRING_get_network (param: identifiers, species=9606), intact_get_interactions, HumanBase
Build PPI network from top 15-20 genes; identify hub genes by degree centrality

Phase 4: Pathway & Network Layer

Identify enriched biological pathways and cross-pathway connections.

ReactomeAnalysis_pathway_enrichment — identifiers are newline-separated (\n), NOT space-separated
enrichr_gene_enrichment_analysis — param: gene_list (array), libs (array). NOTE: data field is a JSON string that needs parsing
kegg_search_pathway — pathway keyword search

Phase 5: Gene Ontology & Functional Annotation

Characterize biological processes, molecular functions, and cellular components.

Tools: Enrichr (GO libraries), QuickGO, GO annotations, OpenTargets GO
Run GO enrichment for all 3 aspects (BP, MF, CC)

Phase 6: Therapeutic Landscape

Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.

DGIdb_get_drug_gene_interactions — drug interactions by gene (param: genes as array). Often more comprehensive than OpenTargets for drug-gene data.
OpenTargets drugs/tractability (use EFO IDs like EFO_0000384 for Crohn's, not MONDO — MONDO IDs may return null for drug queries)
search_clinical_trials — query_term is REQUIRED

Phase 7: Multi-Omics Integration

Integrate findings across all layers. See integration-scoring.md for full details.

Cross-layer gene concordance: count layers per gene, score multi-layer hub genes
Direction concordance: genetics + expression agreement
Biomarker identification: diagnostic, prognostic, predictive
Mechanistic hypothesis generation

Phase 8: Report Finalization

Write executive summary, calculate confidence score, verify completeness.

See integration-scoring.md for quality checklist and scoring formula

Key Tool Parameter Notes

These are the most common parameter pitfalls:

OpenTargets disease IDs: underscore format (MONDO_0004975), NOT colon
STRING protein_ids: must be array (['APOE']), not string
enrichr libs: must be array (['KEGG_2021_Human'])
HPA_get_rna_expression_by_source: ALL 3 params required (gene_name, source_type, source_name)
humanbase_ppi_analysis: ALL params required (gene_list, tissue, max_node, interaction, string_mode)
expression_atlas_disease_target_score: pageSize is REQUIRED
search_clinical_trials: query_term is REQUIRED even if condition is provided

For full tool parameters and per-phase workflows, see tool-reference.md.

Reference Files

All detailed content is in reference files in this directory:

File	Contents
`tool-reference.md`	Full tool parameters, inputs/outputs, per-phase workflows, quick reference table
`report-template.md`	Complete report markdown template with all sections and checklists
`integration-scoring.md`	Confidence score formula (0-100), evidence grading (T1-T4), integration procedures, quality checklist
`response-formats.md`	Verified JSON response structures for key tools
`use-patterns.md`	Common use patterns, edge case handling, fallback strategies