Help us improve
Share bugs, ideas, or general feedback.
From tooluniverse
Analyzes chromatin state, histone modifications, ATAC-seq accessibility, and TF binding from ENCODE, Roadmap Epigenomics, and ChIP-Atlas. Use for regulatory landscape mapping and cCRE annotations.
npx claudepluginhub mims-harvard/tooluniverse --plugin tooluniverseHow this skill is triggered — by the user, by Claude, or both
Slash command
/tooluniverse:tooluniverse-epigenomics-chromatinThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Methylation array data processing (CpG beta values, differential methylation) -> Use `tooluniverse-epigenomics`
Annotates genetic variants (GWAS hits, eQTLs, rare variants) with ENCODE data for regulatory impact, causal identification, enrichment testing, and gene linking.
Annotates regulatory elements, predicts TF binding sites, and scores variant regulatory impact using JASPAR motifs, ENCODE ChIP-seq/cCREs, RegulomeDB, and UCSC.
Scores genetic variants for regulatory potential via RegulomeDB v2 REST API. Queries by rsID, position, or region and returns ranking (1a–7) with TF binding, histone marks, DNase, motifs, eQTLs, and chromatin state evidence. Use for GWAS hit prioritization and regulatory annotation.
Share bugs, ideas, or general feedback.
tooluniverse-epigenomicstooluniverse-rnaseq-deseq2tooluniverse-gwas-snp-interpretationtooluniverse-variant-analysisBefore calling any tool, identify which question type you're answering. Each maps to a different tool set.
(a) Which regulatory elements exist at a locus? Use UCSC_get_encode_cCREs (region-based) or SCREEN_get_regulatory_elements (gene-based). Then check ENCODE_get_chromatin_state for ChromHMM annotation and ENCODE_search_chromatin_accessibility for ATAC-seq evidence.
(b) Which TFs bind there? Use ReMap_get_transcription_factor_binding for ChIP-seq experiments. Use jaspar_search_matrices to retrieve binding motifs and check whether the sequence disrupts a known motif.
(c) How does a variant affect regulation? Use RegulomeDB_query_variant for a scored summary. Then build multi-layer evidence: UCSC_get_encode_cCREs (is the variant in a cCRE?), GTEx_get_single_tissue_eqtls (is it an eQTL?), jaspar_search_matrices (does it disrupt a TF motif?). No single layer is sufficient — see the variant reasoning section below.
(d) What genes are regulated by an element? Use GTEx_get_single_tissue_eqtls or GTEx_query_eqtl to find genes whose expression is associated with variants in the element. Use SCREEN_get_regulatory_elements with element_type="PLS"/"pELS"/"dELS" to classify element-to-promoter relationships.
Use histone mark identity to guide tool queries and interpret results before fetching data.
Bivalent promoter logic: If you observe H3K4me3 + H3K27me3 together at the same locus, the promoter is bivalent — poised but not active. This is common in stem cells and developmentally regulated genes. Do not report such genes as "actively transcribed." Use GTEx_get_expression_summary to check if the gene is actually expressed in the tissue of interest.
Inference rule: If a user asks about a mark you haven't queried yet, ask: does the mark you have found already answer the question? H3K4me3 in a region predicts active transcription; you may not need to also query H3K36me3 unless confirming elongation specifically.
An eQTL means variant X is statistically associated with expression of gene Y in tissue T. Before reporting eQTL results, apply this chain of reasoning:
To assess a non-coding variant's regulatory impact, build evidence from multiple independent layers. No single layer is sufficient.
Layer 1 — RegulomeDB score: High probability (score 1a–2b) means convergent evidence from eQTL + TF binding + DNase. Score 4–7 means weak support. Use as a triage filter.
Layer 2 — Regulatory element overlap: Query UCSC_get_encode_cCREs at the variant's coordinates. If the variant falls in a cCRE (especially PLS or pELS), it is in a functional context.
Layer 3 — eQTL evidence: Query GTEx_get_single_tissue_eqtls for nearby genes. If the variant is a significant eQTL, the association supports regulatory function.
Layer 4 — TFBS disruption: Query jaspar_search_matrices for TFs with motifs at the locus. If the variant changes a high-information-content position in a motif, it is a strong functional candidate.
Synthesis rule: Report each layer separately. Convergence across 3+ layers = high-confidence regulatory variant. A single layer (e.g., eQTL alone) warrants caution.
MyGene_query_genes: query (string). Converts gene symbols to Ensembl IDs and coordinates. Filter results by symbol == '<GENE>' — first hit may not match.
ensembl_lookup_gene: gene_id (Ensembl ID), species (REQUIRED, "homo_sapiens"). Returns chr/start/end.
Key format notes:
ENSG00000012048.20rs4994chr17_43705621_T_C_b38chrom="chr17", start=7668421, end=7687490ENCODE_search_histone_experiments: target (histone mark), cell_type (or tissue alias), biosample_term_name (most explicit ENCODE ontology name), limit.
ENCODE anatomy term notes: "breast" → try "breast epithelium" or "mammary epithelial cell"; "brain" → "brain" works; if 0 results, append "tissue", "epithelium", or "cell".
result = tu.tools.ENCODE_search_histone_experiments(target="H3K27ac", cell_type="GM12878", limit=5)
# result["data"]["experiments"][0]["accession"] -> "ENCSR000AKC"
GEO_search_chipseq_datasets: Fallback for older or non-ENCODE ChIP-seq datasets.
ENCODE_search_chromatin_accessibility: cell_type, limit. Returns ATAC-seq experiments.
ENCODE_get_chromatin_state: cell_type, limit. Returns ChromHMM 15-state annotations (TssA, Enh, TssBiv, ReprPC, etc.). Use to confirm bivalent promoter state or enhancer classification.
ENCODE_search_rnaseq_experiments: assay_type (default "total RNA-seq"), biosample, limit. If 0 results, retry with assay_type="polyA plus RNA-seq".
GEO_search_rnaseq_datasets / GEO_search_atacseq_datasets: query, organism, limit (also max_results). GEO adds "ATAC-seq" automatically for the ATAC tool.
ReMap_get_transcription_factor_binding (CTCF): gene_name="CTCF", cell_type, limit. Returns ENCODE TF ChIP-seq experiments.
SCREEN_get_regulatory_elements: gene_name, element_type (PLS/pELS/dELS/CTCF-only/DNase-H3K4me3), limit.
UCSC_get_encode_cCREs: chrom (REQUIRED), start (REQUIRED), end (REQUIRED), genome (default "hg38"). Returns cCREs with Z-scores for DNase, H3K4me3, H3K27ac, CTCF signals.
# cCREs near TP53
result = tu.tools.UCSC_get_encode_cCREs(chrom="chr17", start=7668421, end=7687490, genome="hg38")
ENCODE_search_annotations: annotation_type ("candidate Cis-Regulatory Elements" or "chromatin state"), biosample_term_name, organism, assembly, limit.
GTEx_get_single_tissue_eqtls: gene_symbol. Returns all significant eQTLs across tissues with snpId, pValue, tissueSiteDetailId, nes (normalized effect size).
result = tu.tools.GTEx_get_single_tissue_eqtls(gene_symbol="BRCA1")
from collections import Counter
tissue_counts = Counter(e["tissueSiteDetailId"] for e in result["data"])
GTEx_query_eqtl: gene_symbol, tissue (tissueSiteDetailId), page (1-indexed), size. Use for a specific tissue.
GTEx_get_multi_tissue_eqtls: operation="get_multi_tissue_eqtls", gencode_id (versioned, REQUIRED). Returns per-variant m-values showing tissue-sharing. m-value near 1.0 = effect present; near 0.0 = absent.
result = tu.tools.GTEx_get_multi_tissue_eqtls(
operation="get_multi_tissue_eqtls",
gencode_id="ENSG00000012048.20"
)
GTEx_calculate_eqtl: operation="calculate_eqtl", gencode_id, variant_id (chr_pos_ref_alt_b38), tissue_site_detail_id. Works for non-significant pairs.
eQTL_list_datasets / eQTL_get_associations: EBI eQTL Catalogue. Use dataset_id (from list call), gene_id (Ensembl), variant. Complementary to GTEx.
GTEx_get_expression_summary: gene_symbol. Recommended — auto-resolves GENCODE versions. Returns median TPM per tissue.
result = tu.tools.GTEx_get_expression_summary(gene_symbol="BRCA1")
top_tissues = sorted(result["data"], key=lambda x: x["median"], reverse=True)[:5]
GTEx_get_median_gene_expression: Requires operation="get_median_gene_expression" + exact versioned gencode_id. Use only when version precision is needed.
GTEx_get_tissue_sites: No params. Returns all tissueSiteDetailId values.
jaspar_search_matrices: name (TF name), collection ("CORE"), tax_group ("vertebrates"), species ("9606"), page_size.
result = tu.tools.jaspar_search_matrices(name="CTCF", collection="CORE", page_size=5)
jaspar_get_matrix: Returns position frequency matrix for a JASPAR matrix ID. Use to check if a variant allele disrupts a high-information-content position.
ReMap_get_transcription_factor_binding: gene_name (TF), cell_type, limit. Same tool used for CTCF in Phase 2 — applies to any TF.
STRING_get_functional_annotations: identifiers (gene name), species (9606), category ("Process"/"Function"/"KEGG"). Returns GO/KEGG/Reactome annotations for regulatory context.
RegulomeDB_query_variant: rsid (e.g., "rs4994"). Returns probability, ranking (1a = strongest, 7 = weakest), and tissue-specific scores.
result = tu.tools.RegulomeDB_query_variant(rsid="rs4994")
score = result["data"]["regulome_score"]
# score["ranking"]: "1a" (eQTL + TF + motif + DNase) ... "7" (no evidence)
# score["probability"]: 0.0–1.0
top_tissues = sorted(score["tissue_specific_scores"].items(), key=lambda x: float(x[1]), reverse=True)[:5]
Rankings 1a–1f all have eQTL evidence. Rankings 2a–3b have TF binding without eQTL. Rankings 4–7 have decreasing evidence. Use ranking <= 2b as a threshold for "strong regulatory support."
Combine evidence tiers before reporting:
Convergence of T1+T2 evidence from independent sources (e.g., ENCODE ChIP-seq overlapping a RegulomeDB 1a variant with GTEx eQTL) constitutes strong evidence for regulatory function. Contradictions between layers (e.g., high RegulomeDB score but no eQTL) should be explicitly noted.
| Phase | Primary Tool | Fallback |
|---|---|---|
| Histone ChIP-seq | ENCODE_search_histone_experiments | GEO_search_chipseq_datasets |
| RNA-seq | ENCODE_search_rnaseq_experiments (total RNA-seq) | retry with polyA plus RNA-seq |
| ATAC-seq | ENCODE_search_chromatin_accessibility | GEO_search_atacseq_datasets |
| cCREs | UCSC_get_encode_cCREs | SCREEN_get_regulatory_elements |
| eQTLs | GTEx_get_single_tissue_eqtls | eQTL_get_associations (EBI) |
| Expression | GTEx_get_expression_summary | GTEx_get_median_gene_expression |
| TF motifs | jaspar_search_matrices | ReMap_get_transcription_factor_binding |
| Variant scoring | RegulomeDB_query_variant | combine eQTL + TF binding manually |