Help us improve
Share bugs, ideas, or general feedback.
From encode-toolkit
Annotates ENCODE regulatory variants with ClinVar clinical significance. Identifies pathogenic variants in peaks, assesses clinical impact in enhancers and promoters.
npx claudepluginhub ammawla/encode-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:clinvar-annotationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User wants to check if variants in ENCODE regulatory peaks have clinical significance in ClinVar
Annotates ENCODE regulatory variants with ClinVar clinical significance. Identifies pathogenic variants in peaks, assesses clinical impact in enhancers and promoters.
Queries NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, and disease associations. Search by gene, rsID, condition, or review status.
Queries NCBI ClinVar for variant clinical significance by gene, position, or condition. Interprets pathogenicity classifications and accesses data via E-utilities API or FTP.
Share bugs, ideas, or general feedback.
Cross-reference ENCODE functional genomic elements with ClinVar clinical variant classifications to identify pathogenic variants in regulatory regions and understand non-coding disease mechanisms.
The question: "Do any clinically significant variants fall within my ENCODE regulatory elements, and can ENCODE data explain their pathogenic mechanism?"
ClinVar is NCBI's public archive of variant-disease associations, aggregating submissions from clinical laboratories, research groups, and expert panels. Most ClinVar annotations focus on coding variants, but a growing number of non-coding variants are being classified. ENCODE provides the functional context to explain WHY a non-coding variant is pathogenic — by showing that it disrupts an active enhancer, promoter, or insulator in disease-relevant tissue.
This bidirectional integration serves two use cases:
| Classification | Meaning | ENCODE Relevance |
|---|---|---|
| Pathogenic | Causes disease | If in regulatory region, ENCODE explains mechanism |
| Likely pathogenic | Strong evidence for disease causation | ENCODE data may upgrade to pathogenic |
| Uncertain significance (VUS) | Not enough evidence to classify | ENCODE functional data may help resolve |
| Likely benign | Strong evidence against pathogenicity | — |
| Benign | Does not cause disease | — |
| Conflicting interpretations | Labs disagree on classification | ENCODE data may resolve conflict |
| Risk factor | Increases disease risk | May overlap ENCODE regulatory elements |
| Stars | Review Status | Confidence |
|---|---|---|
| 0 | No assertion criteria | Very low — treat with caution |
| 1 | Single submitter with criteria | Low-moderate |
| 2 | Multiple submitters, no conflict | Moderate |
| 3 | Expert panel reviewed | High |
| 4 | Practice guideline | Highest |
Always check star ratings. A 0-star "pathogenic" classification has very different reliability than a 3-star classification.
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
No authentication required for low-volume use. Rate limit: 3 requests/second without API key, 10/second with NCBI API key.
| Endpoint | Purpose | Example |
|---|---|---|
esearch.fcgi?db=clinvar&term=... | Search ClinVar | Search by gene, variant, condition |
efetch.fcgi?db=clinvar&id=... | Fetch full record | Get complete variant details |
esummary.fcgi?db=clinvar&id=... | Summary record | Get classification, review status |
elink.fcgi?db=clinvar&dbfrom=... | Cross-database links | Link to PubMed, Gene, etc. |
For bulk intersection with ENCODE peaks, download the ClinVar VCF:
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gzhttps://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gzUpdated monthly on the first Thursday.
Determine which direction the analysis runs:
Starting from ENCODE regulatory elements, find clinically significant variants within them.
# Get ENCODE peaks for target tissue
encode_search_experiments(
assay_title="Histone ChIP-seq",
target="H3K27ac",
organ="pancreas",
biosample_type="tissue"
)
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38"
)
Starting from ClinVar pathogenic variants, determine if they overlap ENCODE regulatory elements.
import requests
# Search ClinVar for pathogenic variants in a gene
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
params = {
"db": "clinvar",
"term": "INS[gene] AND pathogenic[clinical significance]",
"retmax": 50,
"retmode": "json"
}
response = requests.get(url, params=params)
result = response.json()
variant_ids = result["esearchresult"]["idlist"]
import requests
import time
def search_clinvar(gene_symbol, significance="pathogenic"):
"""Search ClinVar for variants in a gene with given clinical significance."""
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
term = f"{gene_symbol}[gene] AND {significance}[clinical significance]"
params = {
"db": "clinvar",
"term": term,
"retmax": 100,
"retmode": "json"
}
response = requests.get(url, params=params)
time.sleep(0.34) # Rate limit: 3/sec
return response.json()["esearchresult"]["idlist"]
def get_clinvar_summary(variant_ids):
"""Get summary for ClinVar variant IDs."""
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
params = {
"db": "clinvar",
"id": ",".join(variant_ids[:20]), # Max 20 per request
"retmode": "json"
}
response = requests.get(url, params=params)
time.sleep(0.34)
return response.json()["result"]
# Search for ClinVar variants in a specific genomic region (GRCh38)
term = "11[chromosome] AND 2159000:2162000[chrpos38] AND pathogenic[clinical significance]"
# Download ClinVar VCF (GRCh38)
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
# Filter to pathogenic/likely pathogenic only
bcftools view -i 'INFO/CLNSIG~"Pathogenic" || INFO/CLNSIG~"Likely_pathogenic"' \
clinvar.vcf.gz | \
bcftools query -f '%CHROM\t%POS0\t%END\t%ID\t%INFO/CLNSIG\t%INFO/CLNDN\n' \
> clinvar_pathogenic.bed
# Intersect with ENCODE peaks
# NOTE: ClinVar VCF is 1-based, BED is 0-based — bcftools query with %POS0 handles this
bedtools intersect \
-a clinvar_pathogenic.bed \
-b encode_h3k27ac_peaks.bed \
-wa -wb \
> clinvar_in_encode_enhancers.bed
import pysam
# Open ClinVar VCF
vcf = pysam.VariantFile("clinvar.vcf.gz")
# Define ENCODE peak region (0-based)
chrom, start, end = "chr11", 2159000, 2162000
# Find ClinVar variants in region
for record in vcf.fetch(chrom, start, end):
clnsig = record.info.get("CLNSIG", [])
clndn = record.info.get("CLNDN", [])
print(f"{record.chrom}:{record.pos} {record.ref}>{record.alts} "
f"Significance: {clnsig} Condition: {clndn}")
For each ClinVar variant overlapping an ENCODE element, assess the regulatory impact:
| ClinVar Variant in... | ENCODE Context | Interpretation |
|---|---|---|
| Active enhancer (H3K27ac+) | Tissue-specific, near disease gene | High impact — variant may disrupt enhancer |
| Active promoter (H3K4me3+) | At TSS of disease gene | High impact — variant may affect transcription initiation |
| CTCF binding site | TAD boundary | High impact — may disrupt chromatin insulation |
| Open chromatin only (ATAC+) | No histone marks | Moderate — accessible but function unclear |
| TF binding site | Specific TF known for disease gene | High impact — may disrupt TF binding |
| No ENCODE overlap | Not in regulatory element | Mechanism may be coding, splicing, or untested tissue |
ENCODE functional data can support ACMG criteria for variant classification:
| ACMG Criterion | How ENCODE Data Contributes |
|---|---|
| PS3 (Functional studies) | ENCODE shows variant disrupts active regulatory element |
| PM1 (Critical domain) | Variant in a regulatory element active in disease tissue |
| PP3 (Computational evidence) | Multiple ENCODE annotations converge on regulatory disruption |
| BS3 (No functional impact) | ENCODE shows region is inactive in all relevant tissues |
| Variant | ClinVar ID | Classification | Stars | Condition | ENCODE Overlap | Tissue Active | Impact |
|---|---|---|---|---|---|---|---|
| chr11:2160994 A>G | VCV000012345 | Pathogenic | 3 | Neonatal diabetes | H3K27ac enhancer | Pancreas | High |
| chr7:87654321 C>T | VCV000067890 | VUS | 1 | Cystic fibrosis | ATAC-seq peak | Lung | Moderate |
Report:
encode_log_derived_file(
file_path="/path/to/clinvar_encode_intersection.tsv",
source_accessions=["ENCSR...", "ENCSR..."],
description="Intersection of ClinVar pathogenic variants with ENCODE H3K27ac and ATAC-seq peaks in pancreas",
file_type="variant_annotation",
tool_used="bedtools intersect + ClinVar VCF (2024-01 release)",
parameters="GRCh38, pathogenic+likely_pathogenic, IDR thresholded peaks"
)
encode_link_reference(
experiment_accession="ENCSR...",
reference_type="other",
reference_id="ClinVar:VCV000012345",
description="Pathogenic variant for neonatal diabetes overlapping pancreas enhancer"
)
Goal: Cross-reference variants in ENCODE-defined regulatory elements with ClinVar clinical significance to identify non-coding variants with known disease associations. Context: Most GWAS hits fall in non-coding regions. ENCODE maps the regulatory landscape; ClinVar provides clinical interpretation.
encode_search_experiments(assay_title="ATAC-seq", organ="heart", organism="Homo sapiens")
Expected output:
{
"total": 18,
"results": [
{"accession": "ENCSR789HRT", "assay_title": "ATAC-seq", "biosample_summary": "heart left ventricle", "status": "released"}
]
}
encode_list_files(accession="ENCSR789HRT", file_format="bed", output_type="IDR thresholded peaks", assembly="GRCh38")
Expected output:
{
"files": [
{"accession": "ENCFF101ATK", "output_type": "IDR thresholded peaks", "file_format": "bed narrowPeak", "file_size_mb": 0.8}
]
}
Using ClinVar E-utilities (via skill guidance):
GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=chr1[chr]+AND+10000:20000[chrpos]+AND+pathogenic[clnsig]
Expected response:
{
"esearchresult": {
"count": "3",
"idlist": ["12345", "67890", "11111"]
}
}
For each ClinVar variant in an ENCODE peak:
Interpretation: Non-coding pathogenic variants in heart-specific open chromatin suggest regulatory disruption of cardiac gene expression. These are candidates for CRISPR validation.
encode_get_facets(facet_field="organ", assay_title="ATAC-seq", organism="Homo sapiens")
Expected output:
{
"facets": {
"organ": {"brain": 32, "heart": 18, "liver": 14, "lung": 10, "kidney": 8}
}
}
encode_get_experiment(accession="ENCSR789HRT")
Expected output:
{
"accession": "ENCSR789HRT",
"assay_title": "ATAC-seq",
"biosample_summary": "heart left ventricle",
"replicates": 2,
"status": "released",
"audit": {"WARNING": 0, "ERROR": 0}
}
encode_track_experiment(accession="ENCSR789HRT", notes="Heart ATAC-seq for ClinVar regulatory variant annotation")
Expected output:
{
"status": "tracked",
"accession": "ENCSR789HRT",
"notes": "Heart ATAC-seq for ClinVar regulatory variant annotation"
}
| This skill produces... | Feed into... | Purpose |
|---|---|---|
| Clinical variant annotations | variant-annotation | Comprehensive variant annotation with clinical significance |
| Pathogenic regulatory variants | disease-research | Connect non-coding variants to disease mechanisms |
| ClinVar gene-disease associations | peak-annotation | Prioritize peaks near clinically relevant genes |
| Variant pathogenicity scores | gwas-catalog | Overlay GWAS hits with ClinVar clinical evidence |
| Regulatory variant coordinates | gnomad-variants | Add population frequency context to clinical variants |
| Tissue-specific clinical variants | gtex-expression | Check expression of genes near pathogenic regulatory variants |
| Clinical regulatory elements | regulatory-elements | Classify ClinVar-annotated elements by regulatory function |
When reporting ClinVar annotation results:
gnomad-variants for population frequency context, or variant-annotation for a full ENCODE-based regulatory variant prioritization workflowvariant-annotation — Full ENCODE variant annotation workflow with prioritization scoringgwas-catalog — GWAS variants in ENCODE peaks (population-level associations)gnomad-variants — Population frequency context for ClinVar variantsdisease-research — Disease-focused ENCODE analysis workflowscross-reference — Linking ENCODE experiments to ClinVar and other databasesregulatory-elements — Characterizing the regulatory elements disrupted by variantspublication-trust — Verify literature claims backing analytical decisions