From mims-harvard-tooluniverse
Supports epigenomics research on chromatin accessibility using ENCODE ChIP-seq histone data, CTCF binding, eQTL analysis, cCRE regulatory elements, JASPAR/ReMap TF motifs, and RegulomeDB variant scores. For histone marks, chromatin states, and gene regulation queries.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
- Methylation array data processing (CpG beta values, differential methylation) -> Use `tooluniverse-epigenomics`
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
tooluniverse-epigenomicstooluniverse-rnaseq-deseq2tooluniverse-gwas-snp-interpretationtooluniverse-variant-analysisBefore calling any tool, identify which question type you're answering. Each maps to a different tool set.
(a) Which regulatory elements exist at a locus? Use UCSC_get_encode_cCREs (region-based) or SCREEN_get_regulatory_elements (gene-based). Then check ENCODE_get_chromatin_state for ChromHMM annotation and ENCODE_search_chromatin_accessibility for ATAC-seq evidence.
(b) Which TFs bind there? Use ReMap_get_transcription_factor_binding for ChIP-seq experiments. Use jaspar_search_matrices to retrieve binding motifs and check whether the sequence disrupts a known motif.
(c) How does a variant affect regulation? Use RegulomeDB_query_variant for a scored summary. Then build multi-layer evidence: UCSC_get_encode_cCREs (is the variant in a cCRE?), GTEx_get_single_tissue_eqtls (is it an eQTL?), jaspar_search_matrices (does it disrupt a TF motif?). No single layer is sufficient — see the variant reasoning section below.
(d) What genes are regulated by an element? Use GTEx_get_single_tissue_eqtls or GTEx_query_eqtl to find genes whose expression is associated with variants in the element. Use SCREEN_get_regulatory_elements with element_type="PLS"/"pELS"/"dELS" to classify element-to-promoter relationships.
Use histone mark identity to guide tool queries and interpret results before fetching data.
Bivalent promoter logic: If you observe H3K4me3 + H3K27me3 together at the same locus, the promoter is bivalent — poised but not active. This is common in stem cells and developmentally regulated genes. Do not report such genes as "actively transcribed." Use GTEx_get_expression_summary to check if the gene is actually expressed in the tissue of interest.
Inference rule: If a user asks about a mark you haven't queried yet, ask: does the mark you have found already answer the question? H3K4me3 in a region predicts active transcription; you may not need to also query H3K36me3 unless confirming elongation specifically.
An eQTL means variant X is statistically associated with expression of gene Y in tissue T. Before reporting eQTL results, apply this chain of reasoning:
To assess a non-coding variant's regulatory impact, build evidence from multiple independent layers. No single layer is sufficient.
Layer 1 — RegulomeDB score: High probability (score 1a–2b) means convergent evidence from eQTL + TF binding + DNase. Score 4–7 means weak support. Use as a triage filter.
Layer 2 — Regulatory element overlap: Query UCSC_get_encode_cCREs at the variant's coordinates. If the variant falls in a cCRE (especially PLS or pELS), it is in a functional context.
Layer 3 — eQTL evidence: Query GTEx_get_single_tissue_eqtls for nearby genes. If the variant is a significant eQTL, the association supports regulatory function.
Layer 4 — TFBS disruption: Query jaspar_search_matrices for TFs with motifs at the locus. If the variant changes a high-information-content position in a motif, it is a strong functional candidate.
Synthesis rule: Report each layer separately. Convergence across 3+ layers = high-confidence regulatory variant. A single layer (e.g., eQTL alone) warrants caution.
MyGene_query_genes: query (string). Converts gene symbols to Ensembl IDs and coordinates. Filter results by symbol == '<GENE>' — first hit may not match.
ensembl_lookup_gene: gene_id (Ensembl ID), species (REQUIRED, "homo_sapiens"). Returns chr/start/end.
Key format notes:
ENSG00000012048.20rs4994chr17_43705621_T_C_b38chrom="chr17", start=7668421, end=7687490ENCODE_search_histone_experiments: target (histone mark), cell_type (or tissue alias), biosample_term_name (most explicit ENCODE ontology name), limit.
ENCODE anatomy term notes: "breast" → try "breast epithelium" or "mammary epithelial cell"; "brain" → "brain" works; if 0 results, append "tissue", "epithelium", or "cell".
result = tu.tools.ENCODE_search_histone_experiments(target="H3K27ac", cell_type="GM12878", limit=5)
# result["data"]["experiments"][0]["accession"] -> "ENCSR000AKC"
GEO_search_chipseq_datasets: Fallback for older or non-ENCODE ChIP-seq datasets.
ENCODE_search_chromatin_accessibility: cell_type, limit. Returns ATAC-seq experiments.
ENCODE_get_chromatin_state: cell_type, limit. Returns ChromHMM 15-state annotations (TssA, Enh, TssBiv, ReprPC, etc.). Use to confirm bivalent promoter state or enhancer classification.
ENCODE_search_rnaseq_experiments: assay_type (default "total RNA-seq"), biosample, limit. If 0 results, retry with assay_type="polyA plus RNA-seq".
GEO_search_rnaseq_datasets / GEO_search_atacseq_datasets: query, organism, limit (also max_results). GEO adds "ATAC-seq" automatically for the ATAC tool.
ReMap_get_transcription_factor_binding (CTCF): gene_name="CTCF", cell_type, limit. Returns ENCODE TF ChIP-seq experiments.
SCREEN_get_regulatory_elements: gene_name, element_type (PLS/pELS/dELS/CTCF-only/DNase-H3K4me3), limit.
UCSC_get_encode_cCREs: chrom (REQUIRED), start (REQUIRED), end (REQUIRED), genome (default "hg38"). Returns cCREs with Z-scores for DNase, H3K4me3, H3K27ac, CTCF signals.
# cCREs near TP53
result = tu.tools.UCSC_get_encode_cCREs(chrom="chr17", start=7668421, end=7687490, genome="hg38")
ENCODE_search_annotations: annotation_type ("candidate Cis-Regulatory Elements" or "chromatin state"), biosample_term_name, organism, assembly, limit.
GTEx_get_single_tissue_eqtls: gene_symbol. Returns all significant eQTLs across tissues with snpId, pValue, tissueSiteDetailId, nes (normalized effect size).
result = tu.tools.GTEx_get_single_tissue_eqtls(gene_symbol="BRCA1")
from collections import Counter
tissue_counts = Counter(e["tissueSiteDetailId"] for e in result["data"])
GTEx_query_eqtl: gene_symbol, tissue (tissueSiteDetailId), page (1-indexed), size. Use for a specific tissue.
GTEx_get_multi_tissue_eqtls: operation="get_multi_tissue_eqtls", gencode_id (versioned, REQUIRED). Returns per-variant m-values showing tissue-sharing. m-value near 1.0 = effect present; near 0.0 = absent.
result = tu.tools.GTEx_get_multi_tissue_eqtls(
operation="get_multi_tissue_eqtls",
gencode_id="ENSG00000012048.20"
)
GTEx_calculate_eqtl: operation="calculate_eqtl", gencode_id, variant_id (chr_pos_ref_alt_b38), tissue_site_detail_id. Works for non-significant pairs.
eQTL_list_datasets / eQTL_get_associations: EBI eQTL Catalogue. Use dataset_id (from list call), gene_id (Ensembl), variant. Complementary to GTEx.
GTEx_get_expression_summary: gene_symbol. Recommended — auto-resolves GENCODE versions. Returns median TPM per tissue.
result = tu.tools.GTEx_get_expression_summary(gene_symbol="BRCA1")
top_tissues = sorted(result["data"], key=lambda x: x["median"], reverse=True)[:5]
GTEx_get_median_gene_expression: Requires operation="get_median_gene_expression" + exact versioned gencode_id. Use only when version precision is needed.
GTEx_get_tissue_sites: No params. Returns all tissueSiteDetailId values.
jaspar_search_matrices: name (TF name), collection ("CORE"), tax_group ("vertebrates"), species ("9606"), page_size.
result = tu.tools.jaspar_search_matrices(name="CTCF", collection="CORE", page_size=5)
jaspar_get_matrix: Returns position frequency matrix for a JASPAR matrix ID. Use to check if a variant allele disrupts a high-information-content position.
ReMap_get_transcription_factor_binding: gene_name (TF), cell_type, limit. Same tool used for CTCF in Phase 2 — applies to any TF.
STRING_get_functional_annotations: identifiers (gene name), species (9606), category ("Process"/"Function"/"KEGG"). Returns GO/KEGG/Reactome annotations for regulatory context.
RegulomeDB_query_variant: rsid (e.g., "rs4994"). Returns probability, ranking (1a = strongest, 7 = weakest), and tissue-specific scores.
result = tu.tools.RegulomeDB_query_variant(rsid="rs4994")
score = result["data"]["regulome_score"]
# score["ranking"]: "1a" (eQTL + TF + motif + DNase) ... "7" (no evidence)
# score["probability"]: 0.0–1.0
top_tissues = sorted(score["tissue_specific_scores"].items(), key=lambda x: float(x[1]), reverse=True)[:5]
Rankings 1a–1f all have eQTL evidence. Rankings 2a–3b have TF binding without eQTL. Rankings 4–7 have decreasing evidence. Use ranking <= 2b as a threshold for "strong regulatory support."
Combine evidence tiers before reporting:
Convergence of T1+T2 evidence from independent sources (e.g., ENCODE ChIP-seq overlapping a RegulomeDB 1a variant with GTEx eQTL) constitutes strong evidence for regulatory function. Contradictions between layers (e.g., high RegulomeDB score but no eQTL) should be explicitly noted.
| Phase | Primary Tool | Fallback |
|---|---|---|
| Histone ChIP-seq | ENCODE_search_histone_experiments | GEO_search_chipseq_datasets |
| RNA-seq | ENCODE_search_rnaseq_experiments (total RNA-seq) | retry with polyA plus RNA-seq |
| ATAC-seq | ENCODE_search_chromatin_accessibility | GEO_search_atacseq_datasets |
| cCREs | UCSC_get_encode_cCREs | SCREEN_get_regulatory_elements |
| eQTLs | GTEx_get_single_tissue_eqtls | eQTL_get_associations (EBI) |
| Expression | GTEx_get_expression_summary | GTEx_get_median_gene_expression |
| TF motifs | jaspar_search_matrices | ReMap_get_transcription_factor_binding |
| Variant scoring | RegulomeDB_query_variant | combine eQTL + TF binding manually |