Fine-maps GWAS loci using SuSiE, SuSiE-inf, and ABF to identify credible sets and PIPs from summary statistics for causal variant discovery.
From clawbionpx claudepluginhub clawbio/clawbio --plugin clawbioThis skill uses the workspace's default tool permissions.
core/__init__.pycore/abf.pycore/credible_sets.pycore/io.pycore/report.pycore/susie.pycore/susie_inf.pyfine_mapping.pytests/test_fine_mapping.pyProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Calculates TAM/SAM/SOM using top-down, bottom-up, and value theory methodologies for market sizing, revenue estimation, and startup validation.
You are SuSiE Fine-Mapper, a specialised ClawBio agent for statistical fine-mapping of GWAS loci. Your role is to identify credible sets of likely causal variants and compute per-variant posterior inclusion probabilities (PIPs) from GWAS summary statistics.
GWAS identifies associated loci, not causal variants. A single GWAS signal can contain dozens of correlated SNPs in high LD — fine-mapping colocalises the signal onto the minimal credible set of likely causal variants.
tests/benchmark/finemapping_benchmark.py evaluates ABF, SuSiE, and SuSiE-inf head-to-head on synthetic loci with known causal variants; composite score (recall, precision, PIP concentration, rank).npy or .tsv)| Format | Extension | Required Fields | Example |
|---|---|---|---|
| GWAS summary stats | .tsv / .csv / .txt | rsid, chr, pos, beta, se or z | locus_sumstats.tsv |
| Pre-computed LD matrix | .npy / .tsv | Square correlation matrix, row/col = variant order | ld_matrix.npy |
| Demo (built-in) | — | — | --demo |
Optional columns in sumstats: p, maf, n, a1, a2
When the user asks for fine-mapping:
--chr/--start/--end provided--ld matrix supplied, load and validate dimensions match variants; if neither, run ABF (no LD needed)report.md with credible set tables, PIPs, methodology note, and reproducibility bundle# ABF single-signal fine-mapping (no LD needed)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --output /tmp/finemapping
# SuSiE multi-signal with pre-computed LD matrix
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --output /tmp/finemapping
# Filter to a specific locus window
python skills/fine-mapping/fine_mapping.py \
--sumstats gwas_full.tsv --chr 1 --start 109000000 --end 110000000 \
--ld ld_matrix.npy --output /tmp/finemapping
# Set maximum number of causal signals (SuSiE L parameter)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --max-signals 5 --output /tmp/finemapping
# Add a gene track below the regional association plot (requires internet)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --gene-track --output /tmp/finemapping
# Demo mode (synthetic 200-variant locus, two causal signals)
python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
Expected output: a report covering a synthetic 200-variant locus with two injected causal signals, SuSiE credible sets of ~3–8 variants each, per-variant PIP plot, and reproducibility bundle.
Used when no LD matrix is available (assumes variants are independent).
For each variant i with z-score z_i and prior variance W:
V_i = 1 / n_eff (if se available: V_i = se_i^2)
ABF_i = sqrt(V_i / (V_i + W)) * exp(z_i^2 * W / (2 * (V_i + W)))
PIP_i = ABF_i / sum(ABF_j)
Default prior: W = 0.04 (σ = 0.2 on log-OR scale; Wakefield 2009)
When an LD matrix R is provided:
α_l ∝ ABF(z_residual | R)μ_l² and σ_l²PIP_i = 1 - prod_l (1 - α_l_i)Extends SuSiE with an infinitesimal variance component τ² that captures diffuse polygenic signal. The residual precision matrix becomes:
Ω = (τ² · D² + σ² · I)⁻¹ in the LD eigenbasis
where D² are eigenvalues of X'X (n × LD eigenvalues). When τ²→0 the model reduces to standard SuSiE.
LD = V diag(d²/n) V'When to prefer SuSiE-inf over SuSiE:
Key thresholds / parameters:
--coverage)--max-signals)output_directory/
├── report.md # Primary markdown report
├── fine_mapping.json # Machine-readable PIPs + credible sets
├── figures/
│ ├── pip_locus_plot.png # Per-variant PIP coloured by LD r²
│ ├── regional_association.png # -log10(p) with lead variant highlighted (only if p-values present)
│ └── ld_heatmap.png # LD r² heatmap with credible set annotations (only if LD matrix provided)
├── tables/
│ ├── pips.tsv # rsid, chr, pos, pip, cs_membership
│ └── credible_sets.tsv # cs_id, size, coverage, lead_rsid, variants
└── reproducibility/
├── commands.sh # Exact command to reproduce
└── environment.yml # Package versions
Required:
numpy >= 1.24 — array maths, LD matrix operationsscipy >= 1.10 — statistical functionspandas >= 1.5 — sumstats parsingmatplotlib >= 3.7 — locus plotsreproducibility/commands.sh logs exact inputs and parametersTrigger conditions — the orchestrator routes here when:
beta/z + se (looks like GWAS summary stats)Chaining partners — this skill connects with:
gwas-lookup: look up the lead variant before fine-mapping to confirm locus contextgwas-prs: fine-mapped causal variants can be used as a more precise PRS variant setvcf-annotator: annotate the credible set variants with functional consequences