Help us improve
Share bugs, ideas, or general feedback.
From encode-toolkit
Queries gnomAD for population allele frequencies, gene constraint scores (pLI, LOEUF), and variant annotations to interpret ENCODE regulatory variants. Useful for filtering rare variants in cCREs, GWAS overlaps, or CRISPR/MPRA results.
npx claudepluginhub ammawla/encode-toolkit --plugin encode-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:gnomad-variantsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User wants to check population allele frequencies for variants in ENCODE regulatory regions
Queries gnomAD for population allele frequencies, gene constraint scores (pLI, LOEUF), and variant annotations to interpret ENCODE regulatory variants. Useful for filtering rare variants in cCREs, GWAS overlaps, or CRISPR/MPRA results.
Queries gnomAD for population allele frequencies, variant constraint scores (pLI, LOEUF), and loss-of-function intolerance. Essential for variant pathogenicity interpretation and rare disease genetics.
Queries gnomAD v4 variant frequencies via GraphQL API, stratified by 9 ancestry groups, plus gene constraint metrics (pLI, LOEUF) and coverage.
Share bugs, ideas, or general feedback.
Annotate ENCODE-identified regulatory variants with population allele frequencies and gene constraint scores from the Genome Aggregation Database.
The question: "How common is this variant in the population, and how constrained is the gene it regulates?"
ENCODE identifies regulatory elements and the variants within them, but does not provide population frequency data. gnomAD (v4.1: 807,162 individuals, 730,947 exomes + 76,215 genomes) fills this gap — enabling researchers to distinguish common regulatory variants (likely benign or with modest effect) from rare variants (potentially pathogenic or high-impact).
| ENCODE Provides | gnomAD Provides | Combined Insight |
|---|---|---|
| Variant overlaps cCRE (dELS) | AF = 0.0001 (rare) | Rare variant disrupting an enhancer → high priority |
| Variant in TF binding site | AF = 0.15 (common) | Common regulatory variant → likely modest effect or GWAS candidate |
| Target gene identified | LOEUF = 0.12 (highly constrained) | Constrained gene + rare enhancer variant → strong candidate |
| Variant in CRISPR-validated enhancer | Not in gnomAD (absent) | Ultra-rare/de novo → possible pathogenic regulatory variant |
| User Has | Query Strategy |
|---|---|
| Specific variant (rs ID or chr-pos-ref-alt) | Single variant lookup |
| List of GWAS/eQTL variants | Batch variant query |
| Gene of interest (ENCODE target) | Gene constraint lookup |
| Genomic region with ENCODE peaks | Region variant query |
Endpoint: https://gnomad.broadinstitute.org/api
Method: POST with GraphQL query in JSON body
Authentication: None required
Rate limit: IP-level throttling; throttle to ~1 request/second for batch queries
curl -X POST https://gnomad.broadinstitute.org/api \
-H "Content-Type: application/json" \
-d '{
"query": "query { variant(variantId: \"1-55517991-C-CAT\", dataset: gnomad_r4) { exome { ac an af } genome { ac an af } joint { ac an af } } }"
}'
Variant ID format: CHR-POS-REF-ALT (1-based position, no "chr" prefix)
curl -X POST https://gnomad.broadinstitute.org/api \
-H "Content-Type: application/json" \
-d '{
"query": "query { gene(gene_symbol: \"BRCA2\", reference_genome: GRCh38) { symbol gene_id gnomad_constraint { pLI oe_lof oe_lof_lower oe_lof_upper oe_mis oe_mis_lower oe_mis_upper } } }"
}'
curl -X POST https://gnomad.broadinstitute.org/api \
-H "Content-Type: application/json" \
-d '{
"query": "query { region(chrom: \"1\", start: 55505222, stop: 55530526, reference_genome: GRCh38) { variants(dataset: gnomad_r4) { variant_id pos ref alt exome { ac af } genome { ac af } } } }"
}'
| Metric | Definition | Interpretation |
|---|---|---|
| LOEUF | Loss-of-function observed/expected upper bound 90% CI | <0.35 = highly constrained (v2); <0.6 = constrained (v4) |
| pLI | Probability of being loss-of-function intolerant | >0.9 = LoF-intolerant (legacy metric from ExAC) |
| oe_lof | Observed/expected loss-of-function ratio | <0.2 = highly constrained |
| oe_mis | Observed/expected missense ratio | <0.6 = missense constrained |
| Z_syn | Synonymous Z-score | Near 0 expected; deviation suggests selection |
LOEUF is preferred over pLI for gnomAD v4+. LOEUF is continuous and better calibrated.
For genes identified as targets of ENCODE regulatory elements:
| LOEUF | Interpretation | Implication for Regulatory Variants |
|---|---|---|
| <0.35 | Highly constrained (haploinsufficient) | Regulatory variants likely pathogenic; even modest expression changes may be deleterious |
| 0.35-0.6 | Moderately constrained | Regulatory variants worth investigating |
| >0.6 | Tolerant of LoF | Expression changes likely tolerated; regulatory variants less likely pathogenic |
| Category | Allele Frequency | Use Case |
|---|---|---|
| Ultra-rare | AF < 0.0001 (1 in 10,000) | Mendelian disease candidates |
| Rare | AF < 0.01 (1%) | Rare disease, high-penetrance |
| Low-frequency | 0.01-0.05 | eQTL fine-mapping |
| Common | AF > 0.05 (5%) | GWAS, population-level effects |
gnomAD provides frequencies for genetic ancestry groups:
Critical: A variant "rare" globally may be common in one population. Always check population-specific frequencies when interpreting regulatory variants in disease context.
1. Identify regulatory variants from ENCODE:
encode_search_files(output_type="IDR thresholded peaks", organ="pancreas", file_format="bed")
→ Intersect peaks with GWAS/eQTL variants using bedtools
2. Get allele frequencies from gnomAD:
→ GraphQL query for each variant
→ Filter by desired AF threshold
3. Check constraint of target genes:
→ For each variant-to-gene link (from ABC model, ENCODE enhancer-gene maps)
→ Query gnomAD gene constraint (LOEUF)
4. Prioritize:
→ Rare variant (AF < 0.01) + active enhancer + constrained gene (LOEUF < 0.35) = HIGH PRIORITY
→ Common variant (AF > 0.05) + active enhancer = potential GWAS mechanism
→ Absent from gnomAD + CRISPR-validated enhancer = potential de novo pathogenic
5. Track provenance:
encode_log_derived_file(
file_path="/path/to/prioritized_variants.tsv",
source_accessions=["ENCSR...", "gnomAD_v4.1"],
description="ENCODE regulatory variants filtered by gnomAD AF and constraint",
tool_used="bedtools intersect + gnomAD GraphQL"
)
For genome-wide analysis, downloading gnomAD VCFs is more efficient than API queries:
# gnomAD v4 genome VCFs (GRCh38)
# Available at: https://gnomad.broadinstitute.org/downloads
# Files hosted on Google Cloud Storage and AWS
# Download a single chromosome
gsutil cp gs://gcp-public-data--gnomad/release/4.1/vcf/genomes/gnomad.genomes.v4.1.sites.chr1.vcf.bgz .
# Or use Hail for cloud-native analysis
File sizes are large: Full genome VCF is ~1TB. Download per-chromosome files for targeted analysis.
CHR-POS-REF-ALT with 1-based positions and no "chr" prefix. Convert from VCF/BED coordinates carefully.Goal: Use gnomAD population allele frequencies to contextualize variants found in ENCODE regulatory elements, distinguishing rare disease-causing variants from common regulatory polymorphisms. Context: gnomAD provides allele frequencies across 76,000+ genomes. Variants in ENCODE regulatory regions with low population frequency are candidates for disease-causing regulatory mutations.
encode_search_experiments(assay_title="ATAC-seq", organ="kidney", organism="Homo sapiens")
Expected output:
{
"total": 8,
"results": [
{"accession": "ENCSR900KID", "assay_title": "ATAC-seq", "biosample_summary": "kidney", "status": "released"}
]
}
encode_list_files(accession="ENCSR900KID", file_format="bed", output_type="IDR thresholded peaks", assembly="GRCh38")
Expected output:
{
"files": [
{"accession": "ENCFF950KID", "output_type": "IDR thresholded peaks", "file_format": "bed narrowPeak", "file_size_mb": 0.7}
]
}
Using gnomAD GraphQL API (via skill guidance):
{
region(dataset: gnomad_r4, chrom: "16", start: 68771195, stop: 68771395) {
variants {
variant_id
pos
ref
alt
genome {
ac
an
af
populations { id ac an af }
}
}
}
}
Expected response:
{
"data": {
"region": {
"variants": [
{"variant_id": "16-68771250-C-T", "genome": {"af": 0.00012, "ac": 18, "an": 152312}},
{"variant_id": "16-68771300-G-A", "genome": {"af": 0.35, "ac": 53321, "an": 152312}}
]
}
}
}
Interpretation:
Apply gnomAD frequency filters:
{
gene(gene_symbol: "PKD1", reference_genome: GRCh38) {
gnomad_constraint {
pLI
oe_lof
oe_lof_upper
}
}
}
Interpretation: pLI near 1.0 = gene is loss-of-function intolerant. Rare regulatory variants near constrained genes are higher priority.
encode_get_facets(facet_field="organ", assay_title="ATAC-seq", organism="Homo sapiens")
Expected output:
{
"facets": {
"organ": {"brain": 32, "heart": 18, "liver": 14, "kidney": 8, "lung": 10}
}
}
encode_get_experiment(accession="ENCSR900KID")
Expected output:
{
"accession": "ENCSR900KID",
"assay_title": "ATAC-seq",
"biosample_summary": "kidney",
"replicates": 2,
"status": "released",
"audit": {"WARNING": 0, "ERROR": 0}
}
encode_track_experiment(accession="ENCSR900KID", notes="Kidney ATAC-seq for gnomAD regulatory variant filtering")
Expected output:
{
"status": "tracked",
"accession": "ENCSR900KID",
"notes": "Kidney ATAC-seq for gnomAD regulatory variant filtering"
}
| This skill produces... | Feed into... | Purpose |
|---|---|---|
| Population allele frequencies | variant-annotation | Frequency-based variant prioritization |
| Gene constraint scores (pLI, LOEUF) | disease-research | Identify intolerant genes for disease gene prioritization |
| Rare variant coordinates | clinvar-annotation | Cross-reference rare variants with clinical significance |
| Population-specific frequencies | gwas-catalog | Compare GWAS risk allele frequencies across populations |
| Variant frequency filters | regulatory-elements | Identify regulatory regions under selection pressure |
| Frequency-annotated variant BED | peak-annotation | Prioritize rare variants near gene promoters |
| Gene constraint metrics | gtex-expression | Check expression of constrained genes with regulatory variants |
| Skill | When to Use Instead/Additionally |
|---|---|
variant-annotation | Full post-GWAS annotation workflow with ENCODE cCREs, RegulomeDB, CADD |
regulatory-elements | Identifying what regulatory elements a variant falls in |
disease-research | Connecting ENCODE data to disease mechanisms |
ucsc-browser | Getting cCRE annotations and TF binding at variant positions |
ensembl-annotation | VEP annotation and regulatory feature overlap for variants |
data-provenance | Logging gnomAD + ENCODE combined analysis provenance |
clinvar-annotation | Clinical significance and pathogenicity for gnomAD variants |
gwas-catalog | GWAS associations for variants with population frequency context |
publication-trust | Verify literature claims backing analytical decisions |