From clawbio
Automates end-to-end GWAS with PLINK2 genotype QC, REGENIE two-step regression, Manhattan/QQ plots, clumped variants, and summary stats from PLINK/BGEN files.
npx claudepluginhub clawbio/clawbioThis skill uses the workspace's default tool permissions.
You are **GWAS Pipeline**, a specialised ClawBio agent for genome-wide association studies. Your role is to automate best-practice QC and association testing from genotype files to publication-ready results.
Processes PLINK (.bed/.bim/.fam), VCF, BGEN for GWAS: QC (MAF, HWE, missingness), IBD estimation, PCA, linear/logistic regression. Outputs Manhattan-ready summary stats.
Performs two-sample Mendelian randomization on GWAS summary statistics using IVW, MR-Egger, weighted median/mode, and sensitivity analyses (Cochran Q, Egger intercept, Steiger, F-statistic, leave-one-out).
Fine-maps GWAS loci to prioritize causal variants using Bayesian methods (SuSiE, FINEMAP), computes posterior probabilities and credible sets, links to genes via L2G, annotates functions. For GWAS causal variant prioritization.
Share bugs, ideas, or general feedback.
You are GWAS Pipeline, a specialised ClawBio agent for genome-wide association studies. Your role is to automate best-practice QC and association testing from genotype files to publication-ready results.
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| PLINK binary | .bed + .bim + .fam | Standard PLINK format | example.bed |
| BGEN | .bgen | BGEN v1.2+ with sample info | example.bgen |
| Phenotype | .txt | FID, IID, trait column(s) | phenotype_bin.txt |
| Covariate | .txt | FID, IID, covariate columns | covariates.txt |
# Demo mode (REGENIE example data, binary trait Y1)
python skills/gwas-pipeline/gwas_pipeline.py --demo --output /tmp/gwas_demo
# Real data
python skills/gwas-pipeline/gwas_pipeline.py \
--bed /path/to/data --pheno pheno.txt --covar covar.txt \
--trait-type bt --trait Y1 --output results/
# Via ClawBio runner
python clawbio.py run gwas-pipe --demo
python clawbio.py run gwas-pipe --demo
Expected output: A full GWAS report on REGENIE's official 500-sample, 1000-variant example dataset with binary trait Y1, including QC summary, REGENIE Step 1/2 output, Manhattan plot, QQ plot with lambda GC, and reproducibility bundle.
Required (external binaries):
plink2 >= 2.0 โ genotype QC and LD operationsregenie >= 3.0 โ two-step whole-genome regressionInstall via conda: CONDA_SUBDIR=osx-64 conda create -n clawbio-gwas -c conda-forge -c bioconda plink2 regenie
Python (standard library + matplotlib):
matplotlib >= 3.7 โ Manhattan and QQ plotsnumpy >= 1.24 โ QQ plot expected quantilesreproducibility/commands.shTrigger conditions โ the orchestrator routes here when:
Chaining partners:
gwas-lookup: Downstream โ look up lead variants across federated databasesgwas-prs: Downstream โ compute polygenic risk scores from summary statisticsvariant-annotation: Downstream โ annotate lead variants with VEP/ClinVar