Skill

gnomad-database

Queries gnomAD v4 variant frequencies via GraphQL API, stratified by 9 ancestry groups, plus gene constraint metrics (pLI, LOEUF) and coverage.

GraphQL

Python

data-engineering

Popularity

Stars

200

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sciagent-skills:gnomad-database

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at `https://gnomad.broadinstitute.org/api` — no authentication required, no official ...

SKILL.md

758 lines · ~7.2k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars200

Forks21

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

gnomAD Database

Overview

The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at https://gnomad.broadinstitute.org/api — no authentication required, no official SDK.

When to Use

Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
For clinical pathogenicity classifications use clinvar-database; gnomAD provides frequency evidence but does not classify pathogenicity
For GWAS associations at the study level use gwas-database; gnomAD is for population frequency lookups

Prerequisites

Python packages: requests, pandas, matplotlib
Data requirements: gene symbols (e.g., BRCA1), variant IDs (1-69511-A-G format, or rsIDs)
Environment: internet connection; no API key required
Rate limits: no official published limits; use time.sleep(0.5) between requests for polite access; avoid bursts over 10 requests/second

pip install requests pandas matplotlib

Quick Start

import requests
import time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query: str, variables: dict = None) -> dict:
    """Execute a gnomAD GraphQL query and return the data payload."""
    payload = {"query": query, "variables": variables or {}}
    r = requests.post(GNOMAD_API, json=payload, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

# Quick check: get pLI / LOEUF for BRCA1
# GnomadConstraint fields are FLAT (no nested `lof { oe_ci { upper } }` type).
# `pli` is the current field; `pLI` is preserved as a deprecated alias.
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gnomad_constraint { pli oe_lof_upper }
  }
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI:   {constraint['pli']:.3e}")        # ~5.5e-38 (very high LoF-intolerant)
print(f"BRCA1 LOEUF: {constraint['oe_lof_upper']:.3f}") # 0.928

Core API

Query 1: Gene Variant Query

Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.

import requests, time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    symbol
    variants(dataset: $dataset) {
      variant_id
      rsids
      chrom
      pos
      ref
      alt
      consequence
      lof
      genome {
        an
        ac
        af
        faf95 { popmax popmax_population }
      }
    }
  }
}
"""

data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": "PCSK9",
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['symbol']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
    print(f"  {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")

Query 2: Variant Lookup

Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.

VARIANT_QUERY = """
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    rsids
    chrom
    pos
    ref
    alt
    transcript_consequences {
      gene_symbol
      transcript_id
      is_canonical
      major_consequence
      lof
      lof_filter
      lof_flags
    }
    genome {
      an
      ac
      af
      faf95 { popmax popmax_population }
      populations { id ac an homozygote_count }
    }
  }
}
"""

# Query.variant() arg is `variantId` (camelCase). The top-level deprecated
# `consequence`/`lof`/`lof_filter`/`lof_flags` fields on VariantDetails were
# removed — read them from `transcript_consequences` (plural list; pick the
# canonical transcript with is_canonical=True).
data = gnomad_query(VARIANT_QUERY, {
    "variantId": "1-55039974-G-T",    # PCSK9 p.Tyr142Ter (LoF)
    "dataset": "gnomad_r4"
})
v = data["variant"]
canon = next((t for t in (v.get("transcript_consequences") or []) if t.get("is_canonical")),
             (v.get("transcript_consequences") or [{}])[0])
print(f"Variant     : {v['variant_id']}")
print(f"rsIDs       : {v['rsids']}")
print(f"Gene        : {canon.get('gene_symbol')}")
print(f"Consequence : {canon.get('major_consequence')}  |  LoF: {canon.get('lof')}")
g = v["genome"]
print(f"Genome AF   : {g['af']:.2e}  (AC={g['ac']}, AN={g['an']})")
print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")

Query 3: Population Frequencies

Retrieve allele frequency broken down by ancestry group for a specific variant.

import pandas as pd

POPULATION_FREQ_QUERY = """
query PopFreqs($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    genome {
      populations {
        id
        ac
        an
        homozygote_count
      }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "African/African American",
    "amr": "Admixed American",
    "eas": "East Asian",
    "fin": "Finnish",
    "nfe": "Non-Finnish European",
    "sas": "South Asian",
    "asj": "Ashkenazi Jewish",
    "mid": "Middle Eastern",
    "oth": "Other",
}

data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variantId": "1-55039974-G-T",
    "dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]

# VariantPopulation no longer exposes `af` directly — compute from ac/an.
main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(main_pops)
df["af"] = df["ac"] / df["an"]
df["label"] = df["id"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))

Query 4: Coverage Query

Retrieve per-base read depth coverage for a gene region to assess data completeness.

COVERAGE_QUERY = """
query Coverage($chrom: String!, $start: Int!, $stop: Int!,
               $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  region(chrom: $chrom, start: $start, stop: $stop, reference_genome: $reference_genome) {
    coverage(dataset: $dataset) {
      exome  { mean median over_1 over_10 over_20 over_30 over_100 }
      genome { mean median over_1 over_10 over_20 over_30 over_100 }
    }
  }
}
"""

# Coverage is no longer a top-level Query field — it lives under Region, and the
# `RegionCoverage` shape returns parallel `exome` / `genome` arrays (one entry per
# position; there is no per-row `pos` — the array index is implicit position).
data = gnomad_query(COVERAGE_QUERY, {
    "chrom": "1",
    "start": 55039700,
    "stop": 55040200,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4",
})
cov = data["region"]["coverage"]
exome  = cov.get("exome")  or []
genome = cov.get("genome") or []
print(f"Coverage positions: exome={len(exome)}, genome={len(genome)}")
if exome:
    avg_mean = sum(c["mean"] for c in exome) / len(exome)
    pct_20x = sum(1 for c in exome if (c.get("over_20") or 0) > 0.9) / len(exome) * 100
    print(f"Exome  — average mean depth: {avg_mean:.1f}x; {pct_20x:.1f}% positions >=90% at >=20x")
    c = exome[0]
    print(f"  Sample position: mean={c['mean']:.1f}x, median={c['median']}x, "
          f">=10x:{c['over_10']:.3f} >=20x:{c['over_20']:.3f} >=30x:{c['over_30']:.3f}")

Query 5: Gene Constraint

Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    symbol
    name
    gnomad_constraint {
      pli
      mis_z
      lof_z
      obs_lof
      exp_lof
      oe_lof
      oe_lof_lower
      oe_lof_upper
    }
  }
}
"""

# GnomadConstraint fields are flat (no nested `lof { obs exp oe oe_ci }`).
# `gene_name` was removed from Gene — use `name` or `symbol`. `pNull`/`pRec` are gone.
genes = ["PCSK9", "BRCA1", "TP53", "TTN"]
print(f"{'Gene':<10} {'pLI':>10} {'LOEUF':>7} {'mis_z':>7}")
print("-" * 38)
for gene in genes:
    data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
    c = data["gene"]["gnomad_constraint"]
    print(f"{gene:<10} {c['pli']:>10.3e} {c['oe_lof_upper']:>7.3f} {c['mis_z']:>7.2f}")
    time.sleep(0.5)
# Gene             pLI   LOEUF   mis_z
# PCSK9      4.27e-04   0.456    1.41
# BRCA1      5.52e-38   0.928    1.73
# TP53       8.86e-22   1.020    1.93
# TTN        1.00e+00   0.871    1.30

Query 6: Variant Search by Region

Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.

REGION_VARIANTS_QUERY = """
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!,
                     $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
  region(chrom: $chrom, start: $start, stop: $stop,
         reference_genome: $reference_genome) {
    variants(dataset: $dataset) {
      variant_id
      rsids
      pos
      consequence
      lof
      genome {
        af
        ac
        an
        faf95 { popmax }
      }
    }
  }
}
"""

data = gnomad_query(REGION_VARIANTS_QUERY, {
    "chrom": "1",
    "start": 55039974,
    "stop": 55064852,   # PCSK9 coding region
    "dataset": "gnomad_r4",
    "reference_genome": "GRCh38"
})
variants = data["region"]["variants"]
print(f"Variants in region: {len(variants)}")

# Summarize by consequence
from collections import Counter
conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"])
for c, n in conseq_counts.most_common(5):
    print(f"  {c}: {n}")

# Loss-of-function variants
lof_vars = [v for v in variants if v["lof"] == "HC"]
print(f"\nHigh-confidence LoF variants: {len(lof_vars)}")
for v in lof_vars[:3]:
    af = v["genome"]["af"] if v["genome"] else None
    print(f"  {v['variant_id']} | AF={af:.2e}" if af else f"  {v['variant_id']} | AF=NA")

Key Concepts

gnomAD Data Model

gnomAD v4 has two datasets: gnomad_r4 (exomes + genomes, GRCh38, 730K+ individuals) and gnomad_r2_1 (GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through gene(), region(), or direct variant() lookups. Each variant has separate exome and genome frequency objects; the genome object is preferred for population frequency comparisons.

Ancestry Groups

gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):

Code	Population	Dataset size (approx)
`afr`	African/African American	76,000+
`amr`	Admixed American	45,000+
`eas`	East Asian	50,000+
`fin`	Finnish	24,000+
`nfe`	Non-Finnish European	400,000+
`sas`	South Asian	80,000+
`asj`	Ashkenazi Jewish	10,000+
`mid`	Middle Eastern	5,000+
`oth`	Other/Unknown	varies

Filtering Allele Frequency (FAF95)

The faf95 field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with faf95.popmax < 0.001 is likely rare enough to warrant clinical investigation.

Constraint Scores

Score	Interpretation
`pLI > 0.9`	Gene is intolerant to LoF — likely essential
`LOEUF < 0.35`	Strong LoF constraint (upper CI of oe ratio)
`mis_z > 3.09`	Gene shows significant missense constraint
`pLI < 0.1`	Gene tolerates LoF — homozygous LoF variants exist

Common Workflows

Workflow 1: Rare Variant Frequency Report for a Gene

Goal: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id symbol name
    variants(dataset: $dataset) {
      variant_id rsids chrom pos ref alt consequence lof lof_filter
      genome {
        an ac af
        faf95 { popmax popmax_population }
        populations { id ac an homozygote_count }
      }
    }
  }
}
"""

gene = "LDLR"
data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": gene,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})

variants = data["gene"]["variants"]
rows = []
for v in variants:
    g = v.get("genome") or {}
    af = g.get("af")
    if af is None or af >= 0.01:   # keep only rare variants
        continue
    rows.append({
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v.get("consequence"),
        "lof": v.get("lof"),
        "af_genome": af,
        "ac": g.get("ac"),
        "an": g.get("an"),
        "faf95_popmax": g.get("faf95", {}).get("popmax"),
        "faf95_pop": g.get("faf95", {}).get("popmax_population"),
    })

df = pd.DataFrame(rows)
df = df.sort_values("af_genome")
df.to_csv(f"{gene}_rare_variants.csv", index=False)
print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)")
print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6))
# LDLR: 2847 total variants, 2631 rare (AF<1%)
# consequence
# missense_variant              1423
# synonymous_variant             512
# splice_region_variant          231
# stop_gained                    198

Workflow 2: Ancestry-Stratified Frequency Visualization

Goal: Query a list of variants and produce a barplot of allele frequencies by ancestry group.

import requests, time
import pandas as pd
import matplotlib.pyplot as plt

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

POPULATION_FREQ_QUERY = """
query PopFreqs($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    genome {
      populations { id ac an homozygote_count }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN",
    "nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID",
}

variant_id = "1-55039974-G-T"   # PCSK9 p.Tyr142Ter
data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variantId": variant_id,
    "dataset": "gnomad_r4"
})

pops = data["variant"]["genome"]["populations"]
rows = [{"code": p["id"], "af": p["ac"] / p["an"], "ac": p["ac"], "an": p["an"]}
        for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(rows)
df["label"] = df["code"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white")
ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2)
ax.set_xlabel("Ancestry Group")
ax.set_ylabel("Allele Frequency (%)")
ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}")
ax.set_ylim(0, df["af"].max() * 150)
plt.tight_layout()
plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight")
print(f"Saved gnomad_pop_frequencies.png  (n={len(df)} ancestry groups)")
print(df[["label", "af", "ac", "an"]].to_string(index=False))

Workflow 3: Constraint-Guided Gene Prioritization

Goal: Score a gene list by constraint metrics and flag LoF-intolerant genes.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id symbol
    gnomad_constraint {
      pli mis_z lof_z
      obs_lof exp_lof oe_lof oe_lof_lower oe_lof_upper
    }
  }
}
"""

# GnomadConstraint is flat (no nested `lof` type); `pNull` / `pRec` are gone;
# `gene_name` is gone — use `symbol`. Filter LoF-intolerant by LOEUF < 0.35
# (the gnomAD recommendation), which is more robust than pLI > 0.9.
gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"]
records = []
for gene in gene_list:
    try:
        data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
        c = data["gene"]["gnomad_constraint"]
        records.append({
            "gene": gene,
            "pli": c["pli"],
            "LOEUF": c["oe_lof_upper"],
            "mis_z": c["mis_z"],
            "lof_obs": c["obs_lof"],
            "lof_exp": c["exp_lof"],
            "lof_oe":  c["oe_lof"],
        })
    except Exception as e:
        print(f"Warning: {gene} failed — {e}")
    time.sleep(0.5)

df = pd.DataFrame(records).sort_values("LOEUF")
df["lof_intolerant"] = df["LOEUF"] < 0.35
print(df[["gene", "pli", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False))
df.to_csv("constraint_scores.csv", index=False)
print(f"\nLoF-intolerant genes (LOEUF < 0.35): {df['lof_intolerant'].sum()}/{len(df)}")

Key Parameters

Parameter	Function/Endpoint	Default	Range / Options	Effect
`dataset`	All variant queries	—	`gnomad_r4`, `gnomad_r2_1`, `gnomad_r3`	Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1)
`reference_genome`	gene(), region()	—	`GRCh38`, `GRCh37`	Coordinate system; must match dataset
`variant_id`	variant()	—	`CHROM-POS-REF-ALT` string	Identifies the specific variant to query
`gene_symbol`	gene()	—	HGNC symbol string	Gene to retrieve; case-insensitive
`chrom`, `start`, `stop`	region()	—	valid genomic coordinates	Region boundaries for region queries
`faf95.popmax`	variant() genome	—	float 0–1	Filtering allele frequency (95% CI upper bound); use < 0.001 for rare
`lof` filter field	gene() variants	—	`"HC"` (high-confidence), `"LC"`	LoF confidence level
`populations.id`	genome.populations	—	`afr`, `amr`, `eas`, `fin`, `nfe`, `sas`, `asj`, `mid`, `oth`	Per-ancestry frequency

Best Practices

Use gnomad_r4 for GRCh38 analyses: gnomAD v4 is the most current dataset with 730K+ individuals. Use gnomad_r2_1 only when comparing to GRCh37-based variant calls.
Use faf95.popmax for clinical filtering, not overall AF: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.
Add time.sleep(0.5) in batch loops: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.
Filter lof == "HC" for LoF burden analyses: Low-confidence LoF ("LC") annotations are often in repetitive regions or may be sequencing artifacts. High-confidence ("HC") calls are filtered by LOFTEE.
Check AN before interpreting AF: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.

Common Recipes

Recipe: Check if a Variant Is Common in Any Population

When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.

import requests

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"):
    query = """
    query($variantId: String!, $dataset: DatasetId!) {
      variant(variantId: $variantId, dataset: $dataset) {
        genome { faf95 { popmax popmax_population } af }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                                         "variables": {"variantId": variant_id, "dataset": dataset}},
                      timeout=15)
    data = r.json()["data"]["variant"]
    if not data or not data["genome"]:
        return None, "Variant not found in gnomAD"
    af = data["genome"]["af"]
    popmax = data["genome"]["faf95"]["popmax"]
    pop = data["genome"]["faf95"]["popmax_population"]
    is_common = (popmax or 0) >= threshold
    return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}"

common, info = is_common_in_any_population("1-55039974-G-T")
print(f"Common: {common}  |  {info}")
# Common: False  |  overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe

Recipe: Batch Constraint Lookup

When to use: Score multiple genes from a differential expression or GWAS gene list.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_constraint(gene_symbol, reference_genome="GRCh38"):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gnomad_constraint { pli mis_z oe_lof_upper }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}},
                      timeout=15)
    data = r.json().get("data", {}).get("gene", {})
    if not data or not data.get("gnomad_constraint"):
        return None
    c = data["gnomad_constraint"]
    return {"gene": gene_symbol, "pli": c["pli"], "LOEUF": c["oe_lof_upper"], "mis_z": c["mis_z"]}

genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"]
rows = [r for g in genes for r in [get_constraint(g)] if r]
time.sleep(0.5)   # polite delay per gene in real loop

df = pd.DataFrame(rows)
print(df.to_string(index=False))
# gene        pli    LOEUF  mis_z
# BRCA1  5.52e-38   0.928   1.73
# BRCA2  6.34e-09   0.521   0.97

Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.

import requests, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        variants(dataset: $dataset) {
          variant_id rsids chrom pos ref alt consequence lof
          genome { af ac an }
        }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}},
                      timeout=60)
    variants = r.json()["data"]["gene"]["variants"]
    lof = [v for v in variants
           if v.get("lof") == "HC"
           and v.get("genome") and v["genome"].get("af") is not None
           and v["genome"]["af"] < max_af]
    return pd.DataFrame([{
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v["consequence"],
        "af": v["genome"]["af"],
        "ac": v["genome"]["ac"],
    } for v in lof])

df = get_lof_variants("CFTR", max_af=0.001)
print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("CFTR_HC_lof_variants.csv", index=False)

Troubleshooting

Problem	Cause	Solution
`{"errors": [...]}` from GraphQL	Invalid field name, wrong dataset ID, or null gene	Check field names match gnomAD v4 schema; use `gnomad_r4` not `gnomad_v4`
Variant returns `None` genome object	Variant only in exome data, not genome	Try accessing `exome` field instead of `genome`; genome is absent for exome-only variants
Gene query returns empty variants list	Gene symbol not found or mismatch	Verify HGNC symbol (case-sensitive); use `gene_id` (ENSG ID) as fallback
`faf95` returns `null`	Variant is absent or monomorphic in all populations	Check `ac` and `an` — variant may have AC=0 or be filtered
`requests.exceptions.Timeout`	Large gene (e.g., TTN) takes >30s	Increase `timeout=120`; for very large genes use region queries instead
Population AF is `None` for some groups	Variant not observed in that ancestry	Treat `None` AF as 0 for filtering; check `an` to confirm the group was sequenced
`reference_genome` mismatch error	Using GRCh37 coords with `gnomad_r4`	Use `GRCh38` for `gnomad_r4`/`gnomad_r3`; use `GRCh37` only for `gnomad_r2_1`

Related Skills

clinvar-database — ClinVar pathogenicity classifications (complement to gnomAD population frequency data)
gwas-database — GWAS Catalog for SNP-trait associations from published GWAS studies
ensembl-database — Ensembl VEP for variant consequence prediction and gene annotation
dbsnp-database — dbSNP for rsID lookup, variant classes, and cross-database ID mapping

References

gnomAD GraphQL API — Interactive GraphQL explorer and endpoint documentation
Karczewski et al., Nature 2020 — gnomAD v2.1 flagship paper (constraint metrics, LoF analysis)
gnomAD Help & FAQ — Data model, ancestry definitions, FAF95 explanation
gnomAD v4 blog post — gnomAD v4 release notes and dataset composition

gnomad-database

Popularity

Invocation

Context Preview

SKILL.md

gnomad-database

Popularity

Invocation

Context Preview

SKILL.md

gnomAD Database

Overview

When to Use

Prerequisites

Quick Start

Core API

Query 1: Gene Variant Query

Query 2: Variant Lookup

Query 3: Population Frequencies

Query 4: Coverage Query

Query 5: Gene Constraint

Query 6: Variant Search by Region

Key Concepts

gnomAD Data Model

Ancestry Groups

Filtering Allele Frequency (FAF95)

Constraint Scores

Common Workflows

Workflow 1: Rare Variant Frequency Report for a Gene

Workflow 2: Ancestry-Stratified Frequency Visualization

Workflow 3: Constraint-Guided Gene Prioritization

Key Parameters

Best Practices

Common Recipes

Recipe: Check if a Variant Is Common in Any Population

Recipe: Batch Constraint Lookup

Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

Troubleshooting

Related Skills

References

Similar Skills

gnomAD Database

Overview

When to Use

Prerequisites

Quick Start

Core API

Query 1: Gene Variant Query

Query 2: Variant Lookup

Query 3: Population Frequencies

Query 4: Coverage Query

Query 5: Gene Constraint

Query 6: Variant Search by Region

Key Concepts

gnomAD Data Model

Ancestry Groups

Filtering Allele Frequency (FAF95)

Constraint Scores

Common Workflows

Workflow 1: Rare Variant Frequency Report for a Gene

Workflow 2: Ancestry-Stratified Frequency Visualization

Workflow 3: Constraint-Guided Gene Prioritization

Key Parameters

Best Practices

Common Recipes

Recipe: Check if a Variant Is Common in Any Population

Recipe: Batch Constraint Lookup

Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

Troubleshooting

Related Skills

References

Similar Skills