Help us improve
Share bugs, ideas, or general feedback.
From encode-toolkit
Queries Ensembl REST API for VEP variant annotations (consequences, CADD, REVEL, SpliceAI), regulatory build overlaps, coordinate liftover (GRCh37/GRCh38), gene ID resolution, and cross-references.
npx claudepluginhub ammawla/encode-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:ensembl-annotationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User wants to annotate variants with Ensembl VEP (Variant Effect Predictor) consequences
Queries Ensembl REST API for VEP variant annotations (consequences, CADD, REVEL, SpliceAI), regulatory build overlaps, coordinate liftover (GRCh37/GRCh38), gene ID resolution, and cross-references.
Queries Ensembl REST API for gene, transcript, and variant annotations across 300+ species. Retrieves sequences, cross-references (HGNC, RefSeq, UniProt), regulatory features, and orthologs.
Query Ensembl genome database REST API for 250+ species. Supports gene lookups, sequence retrieval, variant analysis, VEP predictions, and comparative genomics.
Share bugs, ideas, or general feedback.
Annotate variants, look up regulatory features, convert coordinates, and resolve gene identifiers using the Ensembl REST API.
The question: "What does the Ensembl Regulatory Build say about this region, and what is the predicted effect of this variant?"
The Ensembl Regulatory Build integrates ENCODE, Roadmap Epigenomics, and Blueprint data into a unified annotation of regulatory features across human cell types. The Variant Effect Predictor (VEP) is the standard tool for variant consequence prediction, integrating 50+ annotation sources including CADD, REVEL, SpliceAI, and AlphaMissense.
Ensembl's Regulatory Build incorporates ENCODE ChIP-seq, DNase-seq, and CTCF data to define regulatory features. Querying Ensembl after an ENCODE analysis provides an independent, aggregated view of regulatory annotations — often including data from non-ENCODE sources (Blueprint, Roadmap) that may cover biosamples not in ENCODE.
Base URL: https://rest.ensembl.org
Authentication: None required
Rate limit: Reasonable use expected; max 5Mb region queries
Formats: JSON (default), XML, GFF3, BED
Current version: Ensembl 114
Add content-type: application/json header to all requests.
Query what regulatory features the Ensembl Regulatory Build assigns to a region:
# Get regulatory features in a region
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/overlap/region/human/7:140424943-140624564?feature=regulatory"
# Also get TF binding motifs
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/overlap/region/human/7:140424943-140624564?feature=regulatory;feature=motif"
| Type | Description | ENCODE Equivalent |
|---|---|---|
| Promoter | Active promoter region | cCRE PLS |
| Enhancer | Active enhancer region | cCRE pELS/dELS |
| Open chromatin | Accessible region without H3K27ac | DNase-only sites |
| CTCF binding site | CTCF-occupied region | cCRE CTCF-only |
| TF binding site | Other TF binding | TF ChIP-seq peaks |
| Promoter flanking | Region flanking a promoter | cCRE TssAFlnk |
VEP provides consequence predictions for variants:
# VEP annotation for a variant
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/region/9:22125503-22125502:1/C"
# By rs ID
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/id/rs699"
curl -X POST -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/region" \
-d '{"variants": ["1 230710048 . A G . . .", "2 241533886 . T C . . ."]}'
| Parameter | Description | Default |
|---|---|---|
CADD=1 | Include CADD scores | Off |
Enformer=1 | Include Enformer predictions | Off |
AlphaMissense=1 | Include AlphaMissense pathogenicity | Off |
REVEL=1 | Include REVEL scores | Off |
SpliceAI=1 | Include SpliceAI splicing predictions | Off |
regulatory=1 | Include regulatory feature overlap | Off |
cell_type= | Cell type for regulatory annotations | All |
| Consequence | Impact | Description |
|---|---|---|
transcript_ablation | HIGH | Deletion of entire transcript |
splice_donor_variant | HIGH | Essential splice donor site |
stop_gained | HIGH | Premature stop codon |
frameshift_variant | HIGH | Reading frame change |
missense_variant | MODERATE | Amino acid change |
splice_region_variant | LOW | Near splice site |
synonymous_variant | LOW | No amino acid change |
regulatory_region_variant | MODIFIER | In regulatory element |
intergenic_variant | MODIFIER | Between genes |
For ENCODE regulatory variants: Most will be classified as regulatory_region_variant (MODIFIER impact). The VEP consequence alone does not capture regulatory impact — combine with ENCODE cCRE class, tissue activity, and TF disruption data.
Convert between GRCh37 (hg19) and GRCh38 (hg38):
# GRCh37 → GRCh38
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/map/human/GRCh37/17:1000000..1000100:1/GRCh38"
# GRCh38 → GRCh37
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/map/human/GRCh38/17:1000000..1000100:1/GRCh37"
When needed: Older GWAS studies report variants on GRCh37. ENCODE data uses GRCh38. Always liftOver before intersecting.
# By Ensembl ID
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/lookup/id/ENSG00000157764?expand=1"
# By symbol
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRAF"
# Get all external references for a gene
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/xrefs/id/ENSG00000157764"
# Filter by external DB
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/xrefs/id/ENSG00000157764?external_db=HGNC"
ENCODE integration: ENCODE target names are typically HGNC symbols or Ensembl IDs. Use this endpoint to resolve between identifier systems.
# Get phenotype associations for a gene
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/phenotype/gene/homo_sapiens/BRCA2"
# Get phenotype associations for a region
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/phenotype/region/homo_sapiens/9:22125500-22136000"
1. Find ENCODE regulatory variants:
→ Intersect GWAS variants with ENCODE cCREs
2. Annotate with VEP:
curl "https://rest.ensembl.org/vep/human/id/rs699?CADD=1;regulatory=1;REVEL=1"
→ Get consequence, CADD score, regulatory overlap
3. Check Ensembl Regulatory Build for independent confirmation:
curl "https://rest.ensembl.org/overlap/region/human/CHR:START-END?feature=regulatory"
→ Compare with ENCODE cCRE classification
4. If working with GRCh37 data, liftOver:
curl "https://rest.ensembl.org/map/human/GRCh37/CHR:POS..POS:1/GRCh38"
5. Resolve gene identifiers for ENCODE targets:
curl "https://rest.ensembl.org/lookup/symbol/homo_sapiens/GENE_SYMBOL"
6. Check disease associations:
curl "https://rest.ensembl.org/phenotype/gene/homo_sapiens/GENE"
Goal: Use Ensembl VEP to predict functional consequences of variants located within ENCODE-defined regulatory elements, combining variant annotation with regulatory context. Context: Ensembl provides the gold-standard Variant Effect Predictor (VEP) and Regulatory Build. Combined with ENCODE peaks, this enables comprehensive variant interpretation.
encode_search_experiments(assay_title="ATAC-seq", organ="brain", organism="Homo sapiens")
Expected output:
{
"total": 32,
"results": [
{"accession": "ENCSR800BRN", "assay_title": "ATAC-seq", "biosample_summary": "brain", "status": "released"},
{"accession": "ENCSR801CTX", "assay_title": "ATAC-seq", "biosample_summary": "cerebral cortex", "status": "released"}
]
}
encode_download_files(accessions=["ENCFF900ATK"], download_dir="/data/brain_atac")
Using Ensembl REST API (via skill guidance):
POST https://rest.ensembl.org/vep/human/region
Content-Type: application/json
{"variants": ["1 230710048 . A G", "7 87160618 . T C"]}
Expected VEP output:
[
{
"input": "1 230710048 . A G",
"most_severe_consequence": "regulatory_region_variant",
"regulatory_feature_consequences": [
{
"regulatory_feature_id": "ENSR00000123456",
"biotype": "promoter",
"consequence_terms": ["regulatory_region_variant"]
}
]
}
]
Interpretation: VEP classifies this as a regulatory_region_variant in an Ensembl Regulatory Build promoter. Cross-referencing with the ENCODE ATAC-seq peak confirms the variant is in open chromatin in brain tissue.
If ENCODE peaks are in GRCh38 but variants are in hg19:
GET https://rest.ensembl.org/regulatory/species/homo_sapiens/id/ENSR00000123456
Interpretation: Compare Ensembl Regulatory Build activity with ENCODE experimental data. If the Regulatory Build calls a region a "promoter" and ENCODE H3K4me3 ChIP-seq confirms this, the annotation is high confidence.
# Step 1: Identify the gene's regulatory neighborhood
# (via Ensembl REST API — see skill instructions)
# Step 2: Search ENCODE for ChIP-seq at the gene locus
encode_search_experiments(
assay_title="Histone ChIP-seq",
organ="liver",
target="H3K27ac"
)
Expected output:
{
"total": 8,
"experiments": [
{
"accession": "ENCSR123ABC",
"assay_title": "Histone ChIP-seq",
"target": "H3K27ac-human",
"biosample_summary": "liver tissue male adult (54 years)"
}
]
}
encode_track_experiment(
accession="ENCSR123ABC",
notes="Liver H3K27ac for VEP-annotated enhancer variants"
)
Expected output:
{
"status": "tracked",
"accession": "ENCSR123ABC",
"publications": 1,
"files": 12
}
| This skill produces... | Feed into... | Using tool/skill |
|---|---|---|
| VEP-annotated variant list | Regulatory variant filtering | variant-annotation skill |
| Gene annotations with coordinates | Peak-to-gene assignment | peak-annotation skill |
| Lifted-over coordinates (GRCh37→38) | ENCODE data integration | liftover-coordinates skill |
| Regulatory Build overlap results | Enhancer/promoter classification | regulatory-elements skill |
| Gene constraint scores | Target prioritization | cross-reference → Open Targets |
| Skill | When to Use Instead/Additionally |
|---|---|
variant-annotation | Full ENCODE-based post-GWAS workflow |
gnomad-variants | Population frequency and constraint data for variants |
regulatory-elements | ENCODE cCRE classification and chromatin state analysis |
ucsc-browser | UCSC-hosted ENCODE tracks and sequence retrieval |
disease-research | Connecting variants to disease mechanisms |
cross-reference | General external database cross-referencing |
publication-trust | Verify literature claims backing analytical decisions |