From sciagent-skills
Queries gnomAD v4 GraphQL API for population variant frequencies by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene constraints (pLI, LOEUF), and coverage. Identifies rare variants for clinical genomics.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at `https://gnomad.broadinstitute.org/api` — no authentication required, no official ...
Queries gnomAD database via GraphQL for allele frequencies, pLI/LOEUF constraint scores, and LoF intolerance. For variant pathogenicity assessment, rare disease genetics, and gene constraint analysis.
Searches 1000 Genomes Project (IGSR) populations by superpopulation ancestry (AFR, AMR, EAS, EUR, SAS), retrieves samples by code, lists data collections for GWAS population stratification analysis.
Queries NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieves alleles, MAF, variant class, clinical links, cross-DB IDs like ClinVar.
Share bugs, ideas, or general feedback.
The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at https://gnomad.broadinstitute.org/api — no authentication required, no official SDK.
clinvar-database; gnomAD provides frequency evidence but does not classify pathogenicitygwas-database; gnomAD is for population frequency lookupsrequests, pandas, matplotlibBRCA1), variant IDs (1-69511-A-G format, or rsIDs)time.sleep(0.5) between requests for polite access; avoid bursts over 10 requests/secondpip install requests pandas matplotlib
import requests
import time
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query: str, variables: dict = None) -> dict:
"""Execute a gnomAD GraphQL query and return the data payload."""
payload = {"query": query, "variables": variables or {}}
r = requests.post(GNOMAD_API, json=payload, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(f"GraphQL errors: {result['errors']}")
return result["data"]
# Quick check: get pLI for BRCA1
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gnomad_constraint { pLI lof { oe_ci { upper } } }
}
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI: {constraint['pLI']:.3f}")
print(f"BRCA1 LOEUF: {constraint['lof']['oe_ci']['upper']:.3f}")
# BRCA1 pLI: 0.999
# BRCA1 LOEUF: 0.127
Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.
import requests, time
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query, variables=None):
r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(f"GraphQL errors: {result['errors']}")
return result["data"]
GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gene_id
gene_name
variants(dataset: $dataset) {
variant_id
rsids
chrom
pos
ref
alt
consequence
lof
genome {
an
ac
af
faf95 { popmax popmax_population }
}
}
}
}
"""
data = gnomad_query(GENE_VARIANTS_QUERY, {
"gene_symbol": "PCSK9",
"reference_genome": "GRCh38",
"dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['gene_name']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
print(f" {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")
Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.
VARIANT_QUERY = """
query VariantDetails($variant_id: String!, $dataset: DatasetId!) {
variant(variant_id: $variant_id, dataset: $dataset) {
variant_id
rsids
chrom
pos
ref
alt
consequence
lof
lof_filter
lof_flags
genome {
an
ac
af
faf95 { popmax popmax_population }
populations {
id
ac
an
af
}
}
}
}
"""
data = gnomad_query(VARIANT_QUERY, {
"variant_id": "1-55039974-G-T", # PCSK9 p.Tyr142Ter (LoF)
"dataset": "gnomad_r4"
})
v = data["variant"]
print(f"Variant: {v['variant_id']}")
print(f"rsIDs: {v['rsids']}")
print(f"Consequence: {v['consequence']} | LoF: {v['lof']}")
g = v["genome"]
print(f"Genome AF: {g['af']:.2e} (AC={g['ac']}, AN={g['an']})")
print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")
Retrieve allele frequency broken down by ancestry group for a specific variant.
import pandas as pd
POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
variant(variant_id: $variant_id, dataset: $dataset) {
variant_id
genome {
populations {
id
ac
an
af
homozygote_count
}
}
}
}
"""
ANCESTRY_LABELS = {
"afr": "African/African American",
"amr": "Admixed American",
"eas": "East Asian",
"fin": "Finnish",
"nfe": "Non-Finnish European",
"sas": "South Asian",
"asj": "Ashkenazi Jewish",
"mid": "Middle Eastern",
"oth": "Other",
}
data = gnomad_query(POPULATION_FREQ_QUERY, {
"variant_id": "1-55039974-G-T",
"dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]
# Filter to top-level ancestry groups (exclude sex-specific)
main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(main_pops)
df["label"] = df["id"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))
Retrieve per-base read depth coverage for a gene region to assess data completeness.
COVERAGE_QUERY = """
query Coverage($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!) {
coverage(dataset: $dataset, chrom: $chrom, start: $start, stop: $stop) {
pos
mean
median
over_1
over_10
over_20
over_30
over_100
}
}
"""
data = gnomad_query(COVERAGE_QUERY, {
"chrom": "1",
"start": 55039700,
"stop": 55040200,
"dataset": "gnomad_r4"
})
cov = data["coverage"]
print(f"Coverage positions retrieved: {len(cov)}")
if cov:
avg_mean = sum(c["mean"] for c in cov) / len(cov)
pct_20x = sum(1 for c in cov if c["over_20"] > 0.9) / len(cov) * 100
print(f"Average mean depth: {avg_mean:.1f}x")
print(f"Positions with >90% samples at >=20x: {pct_20x:.1f}%")
# Example single position
c = cov[0]
print(f"\nPosition {c['pos']}: mean={c['mean']:.1f}x, median={c['median']}x")
print(f" Fraction >=10x: {c['over_10']:.3f}, >=20x: {c['over_20']:.3f}, >=30x: {c['over_30']:.3f}")
Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.
CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gene_id
gene_name
gnomad_constraint {
pLI
pNull
pRec
mis_z
lof {
obs
exp
oe
oe_ci { lower upper }
}
}
}
}
"""
genes = ["PCSK9", "BRCA1", "TP53", "TTN"]
print(f"{'Gene':<10} {'pLI':>6} {'LOEUF':>7} {'mis_z':>7}")
print("-" * 35)
for gene in genes:
data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
c = data["gene"]["gnomad_constraint"]
loeuf = c["lof"]["oe_ci"]["upper"]
print(f"{gene:<10} {c['pLI']:>6.3f} {loeuf:>7.3f} {c['mis_z']:>7.2f}")
time.sleep(0.5)
# Gene pLI LOEUF mis_z
# PCSK9 0.855 0.543 2.11
# BRCA1 0.999 0.127 3.84
# TP53 0.993 0.191 5.21
# TTN 0.001 0.993 -2.40
Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.
REGION_VARIANTS_QUERY = """
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!,
$dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
region(chrom: $chrom, start: $start, stop: $stop,
reference_genome: $reference_genome) {
variants(dataset: $dataset) {
variant_id
rsids
pos
consequence
lof
genome {
af
ac
an
faf95 { popmax }
}
}
}
}
"""
data = gnomad_query(REGION_VARIANTS_QUERY, {
"chrom": "1",
"start": 55039974,
"stop": 55064852, # PCSK9 coding region
"dataset": "gnomad_r4",
"reference_genome": "GRCh38"
})
variants = data["region"]["variants"]
print(f"Variants in region: {len(variants)}")
# Summarize by consequence
from collections import Counter
conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"])
for c, n in conseq_counts.most_common(5):
print(f" {c}: {n}")
# Loss-of-function variants
lof_vars = [v for v in variants if v["lof"] == "HC"]
print(f"\nHigh-confidence LoF variants: {len(lof_vars)}")
for v in lof_vars[:3]:
af = v["genome"]["af"] if v["genome"] else None
print(f" {v['variant_id']} | AF={af:.2e}" if af else f" {v['variant_id']} | AF=NA")
gnomAD v4 has two datasets: gnomad_r4 (exomes + genomes, GRCh38, 730K+ individuals) and gnomad_r2_1 (GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through gene(), region(), or direct variant() lookups. Each variant has separate exome and genome frequency objects; the genome object is preferred for population frequency comparisons.
gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):
| Code | Population | Dataset size (approx) |
|---|---|---|
afr | African/African American | 76,000+ |
amr | Admixed American | 45,000+ |
eas | East Asian | 50,000+ |
fin | Finnish | 24,000+ |
nfe | Non-Finnish European | 400,000+ |
sas | South Asian | 80,000+ |
asj | Ashkenazi Jewish | 10,000+ |
mid | Middle Eastern | 5,000+ |
oth | Other/Unknown | varies |
The faf95 field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with faf95.popmax < 0.001 is likely rare enough to warrant clinical investigation.
| Score | Interpretation |
|---|---|
pLI > 0.9 | Gene is intolerant to LoF — likely essential |
LOEUF < 0.35 | Strong LoF constraint (upper CI of oe ratio) |
mis_z > 3.09 | Gene shows significant missense constraint |
pLI < 0.1 | Gene tolerates LoF — homozygous LoF variants exist |
Goal: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.
import requests, time, pandas as pd
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query, variables=None):
r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(result["errors"])
return result["data"]
GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gene_id gene_name
variants(dataset: $dataset) {
variant_id rsids chrom pos ref alt consequence lof lof_filter
genome {
an ac af
faf95 { popmax popmax_population }
populations { id ac an af }
}
}
}
}
"""
gene = "LDLR"
data = gnomad_query(GENE_VARIANTS_QUERY, {
"gene_symbol": gene,
"reference_genome": "GRCh38",
"dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
rows = []
for v in variants:
g = v.get("genome") or {}
af = g.get("af")
if af is None or af >= 0.01: # keep only rare variants
continue
rows.append({
"variant_id": v["variant_id"],
"rsids": ";".join(v.get("rsids") or []),
"consequence": v.get("consequence"),
"lof": v.get("lof"),
"af_genome": af,
"ac": g.get("ac"),
"an": g.get("an"),
"faf95_popmax": g.get("faf95", {}).get("popmax"),
"faf95_pop": g.get("faf95", {}).get("popmax_population"),
})
df = pd.DataFrame(rows)
df = df.sort_values("af_genome")
df.to_csv(f"{gene}_rare_variants.csv", index=False)
print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)")
print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6))
# LDLR: 2847 total variants, 2631 rare (AF<1%)
# consequence
# missense_variant 1423
# synonymous_variant 512
# splice_region_variant 231
# stop_gained 198
Goal: Query a list of variants and produce a barplot of allele frequencies by ancestry group.
import requests, time
import pandas as pd
import matplotlib.pyplot as plt
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query, variables=None):
r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(result["errors"])
return result["data"]
POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
variant(variant_id: $variant_id, dataset: $dataset) {
variant_id
genome {
populations { id ac an af }
}
}
}
"""
ANCESTRY_LABELS = {
"afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN",
"nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID",
}
variant_id = "1-55039974-G-T" # PCSK9 p.Tyr142Ter
data = gnomad_query(POPULATION_FREQ_QUERY, {
"variant_id": variant_id,
"dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]
rows = [{"code": p["id"], "af": p["af"], "ac": p["ac"], "an": p["an"]}
for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(rows)
df["label"] = df["code"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white")
ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2)
ax.set_xlabel("Ancestry Group")
ax.set_ylabel("Allele Frequency (%)")
ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}")
ax.set_ylim(0, df["af"].max() * 150)
plt.tight_layout()
plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight")
print(f"Saved gnomad_pop_frequencies.png (n={len(df)} ancestry groups)")
print(df[["label", "af", "ac", "an"]].to_string(index=False))
Goal: Score a gene list by constraint metrics and flag LoF-intolerant genes.
import requests, time, pandas as pd
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query, variables=None):
r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(result["errors"])
return result["data"]
CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gene_id gene_name
gnomad_constraint {
pLI pNull pRec mis_z
lof { obs exp oe oe_ci { lower upper } }
}
}
}
"""
gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"]
records = []
for gene in gene_list:
try:
data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
c = data["gene"]["gnomad_constraint"]
records.append({
"gene": gene,
"pLI": c["pLI"],
"LOEUF": c["lof"]["oe_ci"]["upper"],
"mis_z": c["mis_z"],
"lof_obs": c["lof"]["obs"],
"lof_exp": c["lof"]["exp"],
"lof_oe": c["lof"]["oe"],
})
except Exception as e:
print(f"Warning: {gene} failed — {e}")
time.sleep(0.5)
df = pd.DataFrame(records).sort_values("LOEUF")
df["lof_intolerant"] = df["pLI"] > 0.9
print(df[["gene", "pLI", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False))
df.to_csv("constraint_scores.csv", index=False)
print(f"\nLoF-intolerant genes: {df['lof_intolerant'].sum()}/{len(df)}")
| Parameter | Function/Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
dataset | All variant queries | — | gnomad_r4, gnomad_r2_1, gnomad_r3 | Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1) |
reference_genome | gene(), region() | — | GRCh38, GRCh37 | Coordinate system; must match dataset |
variant_id | variant() | — | CHROM-POS-REF-ALT string | Identifies the specific variant to query |
gene_symbol | gene() | — | HGNC symbol string | Gene to retrieve; case-insensitive |
chrom, start, stop | region() | — | valid genomic coordinates | Region boundaries for region queries |
faf95.popmax | variant() genome | — | float 0–1 | Filtering allele frequency (95% CI upper bound); use < 0.001 for rare |
lof filter field | gene() variants | — | "HC" (high-confidence), "LC" | LoF confidence level |
populations.id | genome.populations | — | afr, amr, eas, fin, nfe, sas, asj, mid, oth | Per-ancestry frequency |
Use gnomad_r4 for GRCh38 analyses: gnomAD v4 is the most current dataset with 730K+ individuals. Use gnomad_r2_1 only when comparing to GRCh37-based variant calls.
Use faf95.popmax for clinical filtering, not overall AF: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.
Add time.sleep(0.5) in batch loops: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.
Filter lof == "HC" for LoF burden analyses: Low-confidence LoF ("LC") annotations are often in repetitive regions or may be sequencing artifacts. High-confidence ("HC") calls are filtered by LOFTEE.
Check AN before interpreting AF: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.
When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.
import requests
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"):
query = """
query($variant_id: String!, $dataset: DatasetId!) {
variant(variant_id: $variant_id, dataset: $dataset) {
genome { faf95 { popmax popmax_population } af }
}
}
"""
r = requests.post(GNOMAD_API, json={"query": query,
"variables": {"variant_id": variant_id, "dataset": dataset}},
timeout=15)
data = r.json()["data"]["variant"]
if not data or not data["genome"]:
return None, "Variant not found in gnomAD"
af = data["genome"]["af"]
popmax = data["genome"]["faf95"]["popmax"]
pop = data["genome"]["faf95"]["popmax_population"]
is_common = (popmax or 0) >= threshold
return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}"
common, info = is_common_in_any_population("1-55039974-G-T")
print(f"Common: {common} | {info}")
# Common: False | overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe
When to use: Score multiple genes from a differential expression or GWAS gene list.
import requests, time, pandas as pd
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def get_constraint(gene_symbol, reference_genome="GRCh38"):
query = """
query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gnomad_constraint { pLI mis_z lof { oe_ci { upper } } }
}
}
"""
r = requests.post(GNOMAD_API, json={"query": query,
"variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}},
timeout=15)
data = r.json().get("data", {}).get("gene", {})
if not data or not data.get("gnomad_constraint"):
return None
c = data["gnomad_constraint"]
return {"gene": gene_symbol, "pLI": c["pLI"], "LOEUF": c["lof"]["oe_ci"]["upper"], "mis_z": c["mis_z"]}
genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"]
rows = [r for g in genes for r in [get_constraint(g)] if r]
time.sleep(0.5) # polite delay per gene in real loop
df = pd.DataFrame(rows)
print(df.to_string(index=False))
# gene pLI LOEUF mis_z
# BRCA1 0.999 0.127 3.84
# BRCA2 1.000 0.176 3.21
When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.
import requests, pandas as pd
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001):
query = """
query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
variants(dataset: $dataset) {
variant_id rsids chrom pos ref alt consequence lof
genome { af ac an }
}
}
}
"""
r = requests.post(GNOMAD_API, json={"query": query,
"variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}},
timeout=60)
variants = r.json()["data"]["gene"]["variants"]
lof = [v for v in variants
if v.get("lof") == "HC"
and v.get("genome") and v["genome"].get("af") is not None
and v["genome"]["af"] < max_af]
return pd.DataFrame([{
"variant_id": v["variant_id"],
"rsids": ";".join(v.get("rsids") or []),
"consequence": v["consequence"],
"af": v["genome"]["af"],
"ac": v["genome"]["ac"],
} for v in lof])
df = get_lof_variants("CFTR", max_af=0.001)
print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("CFTR_HC_lof_variants.csv", index=False)
| Problem | Cause | Solution |
|---|---|---|
{"errors": [...]} from GraphQL | Invalid field name, wrong dataset ID, or null gene | Check field names match gnomAD v4 schema; use gnomad_r4 not gnomad_v4 |
Variant returns None genome object | Variant only in exome data, not genome | Try accessing exome field instead of genome; genome is absent for exome-only variants |
| Gene query returns empty variants list | Gene symbol not found or mismatch | Verify HGNC symbol (case-sensitive); use gene_id (ENSG ID) as fallback |
faf95 returns null | Variant is absent or monomorphic in all populations | Check ac and an — variant may have AC=0 or be filtered |
requests.exceptions.Timeout | Large gene (e.g., TTN) takes >30s | Increase timeout=120; for very large genes use region queries instead |
Population AF is None for some groups | Variant not observed in that ancestry | Treat None AF as 0 for filtering; check an to confirm the group was sequenced |
reference_genome mismatch error | Using GRCh37 coords with gnomad_r4 | Use GRCh38 for gnomad_r4/gnomad_r3; use GRCh37 only for gnomad_r2_1 |
clinvar-database — ClinVar pathogenicity classifications (complement to gnomAD population frequency data)gwas-database — GWAS Catalog for SNP-trait associations from published GWAS studiesensembl-database — Ensembl VEP for variant consequence prediction and gene annotationdbsnp-database — dbSNP for rsID lookup, variant classes, and cross-database ID mapping