Help us improve
Share bugs, ideas, or general feedback.
From encode-toolkit
Discovers and characterizes regulatory elements (enhancers, promoters, silencers, super-enhancers) using ENCODE cCRE catalog, ChromHMM, and ROSE. For genomics tasks like chromatin state classification and functional validation.
npx claudepluginhub ammawla/encode-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:regulatory-elementsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User wants to find enhancers, promoters, silencers, or insulators in a specific tissue using ENCODE
Discovers and characterizes regulatory elements (enhancers, promoters, silencers, super-enhancers) using ENCODE cCRE catalog, ChromHMM, and ROSE. For genomics tasks like chromatin state classification and functional validation.
Analyzes chromatin state, histone modifications, ATAC-seq accessibility, and TF binding from ENCODE, Roadmap Epigenomics, and ChIP-Atlas. Use for regulatory landscape mapping and cCRE annotations.
Queries the ENCODE Portal REST API to retrieve regulatory genomics data: TF ChIP-seq, ATAC-seq, histone marks, RNA-seq metadata, BED/bigWig files, and SCREEN cCREs. Use for variant annotation, open chromatin analysis, and peak file download.
Share bugs, ideas, or general feedback.
Identify, classify, and functionally characterize regulatory elements using ENCODE's catalog of 926,535 human candidate cis-regulatory elements (cCREs) and layered functional genomics data.
The question: "What regulatory elements are active in my tissue of interest, and what are they doing?"
The human genome contains an estimated 1–2 million regulatory elements — far outnumbering the ~20,000 protein-coding genes. These elements (enhancers, promoters, silencers, insulators) control when, where, and how much each gene is expressed. No single biochemical assay can definitively identify a regulatory element; instead, combinatorial patterns of chromatin marks, accessibility, and TF binding are used to classify candidate elements.
The ENCODE Phase 3 project (ENCODE Project Consortium 2020) established a registry of 926,535 human and 339,815 mouse cCREs covering 7.9% and 3.4% of their respective genomes. These are classified using combinations of DNase-seq, H3K4me3, H3K27ac, and CTCF ChIP-seq signals across hundreds of biosamples. The registry is accessible via the SCREEN web server and represents the most comprehensive catalog of candidate regulatory elements in any organism.
An expanded registry (Moore et al. 2024, bioRxiv preprint) extends this to 2.35 million human cCREs with functional characterization from STARR-seq, MPRA, and CRISPR perturbation covering >90% of human cCREs.
ENCODE cCREs are candidate regulatory elements identified by biochemical signatures. Biochemical activity (histone marks, accessibility) is necessary but not sufficient for function. A region marked by H3K27ac is likely regulatory, but functional validation (perturbation, reporter assays) is required to confirm that it actually regulates a target gene. The gap between biochemical annotation and validated function is the central challenge.
| cCRE Class | Abbreviation | Biochemical Signature | Genomic Context | Example |
|---|---|---|---|---|
| Promoter-like | PLS | DNase+ H3K4me3+ (±H3K27ac) | Within 200bp of annotated TSS | Gene promoter |
| Proximal enhancer-like | pELS | DNase+ H3K27ac+ (H3K4me3-) | Within 2kb of TSS | Proximal enhancer |
| Distal enhancer-like | dELS | DNase+ H3K27ac+ (H3K4me3-) | >2kb from TSS | Distal enhancer |
| CTCF-only | CTCF-only | DNase+ CTCF+ (no H3K4me3/H3K27ac) | Any | Insulator/boundary |
| DNase-H3K4me3 | DNase-H3K4me3 | DNase+ H3K4me3+ | >200bp from TSS | Unannotated promoter-like |
| Element Type | Key Signatures | ENCODE Assays | Notes |
|---|---|---|---|
| Active promoter | H3K4me3+ H3K27ac+ accessible | Histone ChIP-seq, ATAC/DNase | Corresponds to PLS cCREs |
| Active enhancer | H3K4me1+ H3K27ac+ H3K4me3- accessible | Histone ChIP-seq, ATAC/DNase | Corresponds to pELS/dELS |
| Poised enhancer | H3K4me1+ H3K27me3+ H3K27ac- | Histone ChIP-seq | Bivalent; may activate upon differentiation |
| Primed enhancer | H3K4me1+ only (no H3K27ac, no H3K27me3) | Histone ChIP-seq | Ready for activation but not currently active |
| Super-enhancer | Broad H3K27ac, multiple TFs, high Mediator | ChIP-seq, TF ChIP-seq | ROSE algorithm (Whyte 2013) |
| Silencer | H3K27me3+ (Polycomb) or H3K9me3+ (heterochromatin) | Histone ChIP-seq | Two distinct repressive mechanisms |
| Insulator | CTCF binding at TAD boundary | TF ChIP-seq (CTCF), Hi-C | Blocks enhancer-promoter communication |
| Stretch enhancer | >3kb H3K27ac domain (not classified as super-enhancer) | Histone ChIP-seq | Parker et al. 2013; enriched for disease variants |
Check data availability for the target tissue:
encode_get_facets(organ="...", biosample_type="tissue")
For basic regulatory element identification, you need at minimum:
For full chromatin state classification (ChromHMM), collect all available marks:
# Core marks (essential)
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27ac", organ="...", biosample_type="...")
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K4me3", organ="...", biosample_type="...")
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K4me1", organ="...", biosample_type="...")
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27me3", organ="...", biosample_type="...")
# Extended marks (important for ChromHMM)
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K36me3", organ="...", biosample_type="...")
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K9me3", organ="...", biosample_type="...")
# Accessibility
encode_search_experiments(assay_title="ATAC-seq", organ="...", biosample_type="...")
encode_search_experiments(assay_title="DNase-seq", organ="...", biosample_type="...")
# TF binding (for super-enhancers and insulator identification)
encode_search_experiments(assay_title="TF ChIP-seq", target="CTCF", organ="...", biosample_type="...")
encode_search_experiments(assay_title="TF ChIP-seq", target="p300", organ="...", biosample_type="...")
For each passing experiment:
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38",
preferred_default=True
)
Note: H3K27me3, H3K9me3, and H3K36me3 produce broad peaks (broadPeak format), not narrow peaks. Use output_type="replicated peaks" for these marks.
ENCODE Blacklist filtering (required): Before using any peak or signal files, remove regions overlapping the ENCODE Blacklist (Amemiya et al. 2019, Scientific Reports, 1,372 citations). Blacklisted regions produce artifactual signal in ChIP-seq, ATAC-seq, and DNase-seq assays — they will appear as regulatory elements if not removed. This step is essential before ChromHMM, cCRE overlap analysis, or any regulatory element classification.
hg38-blacklist.v2.bed.gz from Boyle-Lab/Blacklistmm10-blacklist.v2.bed.gzbedtools intersect -v -a peaks.bed -b blacklist.bed > peaks.filtered.bedTrack all experiments:
encode_track_experiment(accession="ENCSR...", notes="regulatory element discovery - [tissue]")
ChromHMM (Ernst & Kellis 2012) uses a multivariate hidden Markov model to segment the genome into chromatin states based on combinatorial histone modification patterns. This is the standard approach for genome-wide regulatory element classification.
5-mark model (most common, used by Roadmap Epigenomics): Uses H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3
Produces 15 or 18 states:
| State | Marks Present | Interpretation |
|---|---|---|
| TssA | H3K4me3 | Active TSS |
| TssAFlnk | H3K4me1 | Flanking active TSS |
| TxFlnk | H3K4me1 | Transcription at gene 5' and 3' |
| Tx | H3K36me3 | Strong transcription |
| TxWk | (weak H3K36me3) | Weak transcription |
| EnhG | H3K4me1 + H3K36me3 | Genic enhancers |
| Enh | H3K4me1 | Enhancers |
| ZNF/Rpts | H3K9me3 + H3K36me3 | ZNF genes & repeats |
| Het | H3K9me3 | Heterochromatin |
| TssBiv | H3K4me3 + H3K27me3 | Bivalent/poised TSS |
| BivFlnk | H3K4me1 + H3K27me3 | Flanking bivalent TSS/enhancer |
| EnhBiv | H3K4me1 + H3K27me3 | Bivalent enhancer |
| ReprPC | H3K27me3 | Repressed Polycomb |
| ReprPCWk | (weak H3K27me3) | Weak repressed Polycomb |
| Quies | (no marks) | Quiescent/low |
Extended model (6+ marks including H3K27ac): Adding H3K27ac allows distinguishing active from poised enhancers and promoters.
Super-enhancers are large clusters of enhancers (typically >10kb) with exceptionally high levels of H3K27ac, Mediator binding, and master TF occupancy. They drive expression of cell-identity genes and are enriched for disease-associated variants (Hnisz et al. 2013).
The Rank Ordering of Super-Enhancers (ROSE) algorithm:
# H3K27ac for super-enhancer calling
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27ac", organ="...", biosample_type="...")
# BRD4 or MED1 ChIP-seq (if available) for validation
encode_search_experiments(assay_title="TF ChIP-seq", target="BRD4", organ="...", biosample_type="...")
Not all biochemically-defined regulatory elements are functionally validated. The validation hierarchy, in decreasing order of confidence:
Search for ENCODE perturbation data:
encode_search_experiments(assay_title="CRISPR screen", organ="...", biosample_type="...")
encode_search_experiments(perturbed=True, organ="...")
encode_search_experiments(assay_title="STARR-seq", organ="...", biosample_type="...")
encode_search_experiments(assay_title="MPRA", organ="...", biosample_type="...")
For most analyses, Level 4 (biochemical) is the starting point. Level 1–3 validation data exists for a minority of elements. When available, always check:
Identifying the target gene of an enhancer is critical. Enhancers can regulate genes >1 Mb away, skipping intervening genes.
Activity-By-Contact model = Enhancer Activity (H3K27ac × ATAC) × Contact Frequency (Hi-C)
encode_search_experiments(assay_title="Hi-C", organ="...", biosample_type="...")
encode_search_experiments(assay_title="ChIA-PET", organ="...", biosample_type="...")
ENCODE is the richest single resource, but compare with:
encode_log_derived_file(
file_path="/path/to/regulatory_elements.bed",
source_accessions=["ENCSR...", "ENCSR...", ...],
description="Regulatory element catalog for [tissue]: [N] enhancers, [N] promoters, [N] super-enhancers",
file_type="regulatory_elements",
tool_used="ChromHMM / ROSE / bedtools intersect",
parameters="ChromHMM 15-state model, ROSE stitching=12.5kb, GRCh38"
)
For the final regulatory element catalog, report:
For detailed biology of each histone mark (writers, erasers, readers, contradictions, cancer-specific states) and ChromHMM combinatorial state definitions, consult skills/histone-aggregation/references/histone-marks-reference.md (1,442 lines, 21 marks, 37 key papers).
Goal: Use ENCODE histone marks and accessibility data to classify regulatory elements into functional categories (active enhancer, poised enhancer, active promoter, insulator, heterochromatin). Context: ENCODE's candidate cis-regulatory elements (cCREs) are classified by their chromatin signature combinations.
encode_get_facets(facet_field="target.label", organ="liver", assay_title="Histone ChIP-seq", organism="Homo sapiens")
Expected output:
{
"facets": {"target.label": {"H3K27ac": 6, "H3K4me3": 5, "H3K4me1": 4, "H3K27me3": 3, "CTCF": 4}}
}
encode_search_experiments(assay_title="Histone ChIP-seq", organ="liver", target="H3K27ac", organism="Homo sapiens")
| Signature | Classification |
|---|---|
| H3K27ac + H3K4me1 (no H3K4me3) | Active enhancer |
| H3K4me1 (no H3K27ac) | Poised enhancer |
| H3K4me3 + H3K27ac | Active promoter |
| H3K4me3 + H3K27me3 | Bivalent promoter |
| CTCF (no H3K4me1/H3K4me3) | Insulator/CTCF-only |
| H3K27me3 | Polycomb-repressed |
encode_download_files(accessions=["ENCFF100AC", "ENCFF200K4M1", "ENCFF300K4M3"], download_dir="/data/regulatory")
encode_search_experiments(assay_title="TF ChIP-seq", organ="liver", target="CTCF", organism="Homo sapiens")
Expected output:
{
"total": 4,
"results": [{"accession": "ENCSR500CTF", "target": "CTCF", "biosample_summary": "liver"}]
}
encode_list_files(accession="ENCSR500CTF", file_format="bed", output_type="IDR thresholded peaks", assembly="GRCh38")
Expected output:
{
"files": [{"accession": "ENCFF600CTF", "output_type": "IDR thresholded peaks", "file_size_mb": 0.5}]
}
encode_track_experiment(accession="ENCSR500CTF", notes="Liver CTCF for insulator classification in regulatory element mapping")
Expected output:
{"status": "tracked", "accession": "ENCSR500CTF"}
variant-annotation — Annotating genetic variants with regulatory element overlapmulti-omics-integration — Combining regulatory elements with expression data and TF bindinghistone-aggregation — Aggregating histone ChIP-seq peaks across samplesaccessibility-aggregation — Aggregating ATAC-seq/DNase-seq peaks across samplesepigenome-profiling — Building comprehensive epigenomic profilesquality-assessment — Evaluating ENCODE experiment quality for regulatory element analysisdisease-research — Regulatory elements are central to disease variant interpretationsingle-cell-encode — Cell type-resolved scATAC-seq provides cell type-specific regulatory element catalogscompare-biosamples — Comparing regulatory elements across tissues is a primary use casehic-aggregation — Hi-C data enables enhancer-gene linkage for regulatory element annotationmethylation-aggregation — DNA methylation at regulatory elements (hypomethylation at active enhancers) is a key signaturedata-provenance — Document all regulatory element discovery parameters for reproducibilityucsc-browser — Retrieve ENCODE cCRE tracks and TF binding clusters from UCSC for regulatory annotationensembl-annotation — Ensembl Regulatory Build provides independent classification of regulatory featuresgnomad-variants — Gene constraint scores help prioritize regulatory elements near constrained genesmotif-analysis — Discover TF motifs enriched in regulatory peaks using HOMER and MEMEpeak-annotation — Annotate peaks with genomic features (promoter, enhancer, intergenic)jaspar-motifs — Validate TF binding in regulatory elements using JASPAR matrix profilespublication-trust — Verify literature claims backing analytical decisions