From mims-harvard-tooluniverse
Infers gene regulatory networks prioritizing direct TF evidence from ChIP-seq, JASPAR motifs, ENCODE data, GTEx eQTLs, and Enrichr target enrichment over correlations.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
**GRN inference starts with: which TF regulates which gene?** Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the T...
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
GRN inference starts with: which TF regulates which gene? Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the TF at the locus in the relevant cell type is stronger (T1). eQTLs from GTEx show which variants affect expression but don't identify the upstream regulator — combine with TF motif disruption analysis for mechanistic insight.
LOOK UP DON'T GUESS: never assume JASPAR matrix IDs, Enrichr library names, or GTEx tissue identifiers — always search JASPAR by TF name and verify library names before calling enrichr.
Activate this skill when the user asks about:
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Determine:
Search JASPAR for the TF's position weight matrix (PWM) and binding motif profile.
Tool: jaspar_search_matrices
Parameters:
search string TF name to search (e.g., "TP53")
limit integer Max results (default 10)
collection string JASPAR collection filter (e.g., "CORE")
species string Taxonomy ID filter (e.g., "9606" for human)
Example:
{"search": "TP53", "limit": 5}
Returns {status, data: {count, results: [{matrix_id, name, collection, base_id, version, sequence_logo}]}}.
Tool: jaspar_get_matrix (for detailed motif info)
Parameters:
matrix_id string JASPAR matrix ID (e.g., "MA0106.3")
Returns PFM (position frequency matrix), species, TF class, UniProt IDs.
Identify target genes from ChIP-seq experiments via Enrichr.
Tool: enrichr_gene_enrichment_analysis
Parameters:
gene_list array List of gene symbols (REQUIRED)
library string Enrichr library name (default "GO_Biological_Process_2023")
top_n integer Top enriched terms to return (default 10)
Key libraries for regulatory network analysis:
"ENCODE_TF_ChIP-seq_2015" -- TF binding from ENCODE ChIP-seq"ChEA_2022" -- ChIP-seq enrichment analysis (broader coverage)"TRRUST_Transcription_Factors_2019" -- Literature-curated TF-target relationships"ARCHS4_TFs_Coexp" -- TF co-expression from RNA-seqExample (find which TFs bind your gene set):
{
"gene_list": ["CDKN1A", "BAX", "MDM2", "GADD45A", "BBC3"],
"library": "ENCODE_TF_ChIP-seq_2015",
"top_n": 10
}
Returns {status, data: {library, gene_count, enriched_terms: [{rank, term, p_value, combined_score, overlapping_genes, adjusted_p_value}]}}.
IMPORTANT: Enrichr takes a gene list and tells you what TFs are enriched. To find targets OF a TF, use the TRRUST library or look up TF ChIP-seq targets directly.
Tool: ENCODE_search_histone_experiments
Parameters:
target string Histone mark (e.g., "H3K27ac", "H3K4me3", "H3K27me3")
tissue string Tissue/cell type (e.g., "liver", "brain")
limit integer Max results (default 10)
Common histone marks and their meaning:
H3K27ac -- Active enhancers and promotersH3K4me3 -- Active promotersH3K4me1 -- Poised/active enhancersH3K27me3 -- Polycomb-repressed regionsH3K9me3 -- HeterochromatinExample:
{"target": "H3K27ac", "tissue": "liver", "limit": 5}
Returns {status, data: {total, experiments: [{accession, histone_mark, biosample_summary, status, lab}]}}.
Tool: GTEx_query_eqtl
Parameters:
gene_symbol string Gene symbol (e.g., "TP53"). REQUIRED.
Returns eQTL SNPs across tissues, showing genetic variants that affect gene expression.
Example:
{"gene_symbol": "TP53"}
Returns {status, data: {singleTissueEqtl: [{snpId, variantId, geneSymbol, pValue, tissueSiteDetailId, nes}]}}. nes = normalized effect size; negative = lower expression with alt allele.
Tool: RegulomeDB_query_variant
Parameters:
rsid string dbSNP rsID (e.g., "rs7412")
Returns regulatory score (1a-7), tissue-specific scores, and overlapping regulatory features.
Tool: STRING_get_interaction_partners
Parameters:
identifiers string Protein/gene name (REQUIRED, e.g., "TP53")
species integer NCBI taxonomy ID (default 9606 for human)
limit integer Max partners to return
required_score integer Min combined score 0-1000 (400=medium, 700=high, 900=highest)
Example:
{"identifiers": "TP53", "species": 9606, "limit": 10}
Returns array of {preferredName_A, preferredName_B, score, escore, dscore, tscore, ascore}. Score components: escore (experimental), dscore (database), tscore (text-mining), ascore (coexpression).
Tool: intact_get_interaction_network
Parameters:
gene_symbol string Gene symbol (REQUIRED)
limit integer Max results
Returns experimentally validated molecular interactions from IntAct.
Tool: BioGRID_get_interactions
Parameters:
gene_symbol string Gene symbol (REQUIRED)
limit integer Max results
Returns physical and genetic interactions with experimental system details.
Tool: EuropePMC_search_articles
Parameters:
query string Search query (REQUIRED)
limit integer Max results (default 10)
Example:
{"query": "TP53 transcription factor regulatory network", "limit": 5}
Tool: PubMed_search_articles
Parameters:
query string Search query (REQUIRED)
limit integer Max results (default 10)
Tool: ols_search_terms
Parameters:
query string Search term (REQUIRED)
ontology string Ontology ID (e.g., "so" for Sequence Ontology, "go" for Gene Ontology)
limit integer Max results
Example for regulatory element types:
{"query": "transcription factor binding site", "ontology": "so", "limit": 5}
Tool: STRING_functional_enrichment
Parameters:
identifiers string Comma-separated gene names (REQUIRED)
species integer NCBI taxonomy ID (default 9606)
Performs GO, KEGG, Reactome enrichment on a gene set from the network.
JASPAR tool name: Use jaspar_search_matrices (lowercase, plural), NOT jaspar_get_matrix.
JASPAR search param: The parameter is search (NOT query or name).
STRING identifiers param: Use identifiers as a string (NOT an array). For multiple proteins, use STRING_get_network with array identifiers.
Enrichr direction: enrichr_gene_enrichment_analysis takes a gene SET and finds enriched TFs/pathways. To find targets of a TF, use "TRRUST_Transcription_Factors_2019" library with known target genes, or consult ENCODE ChIP-seq data directly.
Enrichr gene_list is required: Must be a JSON array of strings, not a single string.
GTEx uses gene_symbol: NOT Ensembl ID. The tool resolves it internally.
ENCODE tissue names: Use lowercase tissue names like "liver", "brain", "heart". Complex queries may fail -- keep tissue names simple.
BioGRID returns interactions as dict: Keys are interaction IDs, values contain OFFICIAL_SYMBOL_A and OFFICIAL_SYMBOL_B.
RegulomeDB rsID format: Must include the "rs" prefix (e.g., "rs7412" not "7412").
No TRRUST direct tool: TRRUST data is accessed via Enrichr library "TRRUST_Transcription_Factors_2019", not a standalone tool.
jaspar_search_matrices -- Get motif info for TF Xenrichr_gene_enrichment_analysis with TRRUST_Transcription_Factors_2019 library -- Use known targetsSTRING_get_interaction_partners -- Find interacting proteinsEuropePMC_search_articles -- Literature on TF X targetsenrichr_gene_enrichment_analysis with gene Y's co-regulated genes + ENCODE_TF_ChIP-seq_2015 libraryGTEx_query_eqtl -- Find eQTLs affecting gene Y expressionENCODE_search_histone_experiments -- Chromatin context at gene Y locusRegulomeDB_query_variant -- Annotate regulatory variants near gene Yenrichr_gene_enrichment_analysis with gene set Z + multiple TF librariesSTRING_get_interaction_partners for hub genesSTRING_functional_enrichment -- Pathway contextBioGRID_get_interactions -- Experimental validationEuropePMC_search_articles -- Supporting literatureGTEx_query_eqtl -- Tissue-specific eQTLs for gene XENCODE_search_histone_experiments with specific tissue -- Active regulatory marksRegulomeDB_query_variant -- Tissue-specific regulatory scores for eQTL SNPsenrichr_gene_enrichment_analysis -- Identify TFs active in that tissueRegulomeDB_query_variant -- Regulatory score and overlapping featuresGTEx_query_eqtl -- Is this variant an eQTL?ENCODE_search_histone_experiments -- Chromatin context at variant locusEuropePMC_search_articles -- Literature on the variant