Help us improve
Share bugs, ideas, or general feedback.
From clawbio
Computes joint PCA for VCF study cohorts against SGDP reference panel (164 populations), generating multi-panel figures with population structure and markdown reports with stats.
npx claudepluginhub clawbio/clawbio --plugin clawbioHow this skill is triggered โ by the user, by Claude, or both
Slash command
/clawbio:claw-ancestry-pcaThe summary Claude sees in its skill listing โ used to decide when to auto-load this skill
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) โ 345 samples from 164 populations spanning every inhabited continent.
Computes HEIM diversity and equity metrics from VCF or ancestry data, generating heterozygosity, FST, PCA plots, and markdown reports. For genomic dataset equity analysis.
Processes PLINK, VCF, BGEN genotype files for GWAS and population genetics: QC (MAF, HWE, missingness), IBD, PCA, linear/logistic regression. Outputs Manhattan-ready summary stats.
Searches 1000 Genomes Project (IGSR) populations and samples by superpopulation or free-text query. Use for ancestry-specific allele frequency lookups, population stratification, and cohort-aware variant analysis.
Share bugs, ideas, or general feedback.
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) โ 345 samples from 164 populations spanning every inhabited continent.
If you ask ChatGPT to "run a PCA against a global reference panel," it will:
This skill encodes the correct methodological decisions:
The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):
python ancestry_pca.py \
--vcf your_cohort.vcf.gz \
--pop-map your_populations.tsv \
--output ancestry_report
python ancestry_pca.py --demo --output demo_report
The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.
Ancestry Decomposition PCA
==========================
Cohort: 736 samples, 28 populations
Reference: SGDP (345 samples, 164 populations)
Common variants: 42,831 biallelic SNPs
Variance explained:
PC1: 51.44% PC2: 21.70% PC3: 6.70%
Panel D โ Global Context:
Cohort samples cluster between European and East Asian
reference populations, with Amazonian groups showing
distinct positioning from Highland and Coastal groups.
Figures saved to: ancestry_report/
Figure3_PCA_composite.png (300 dpi)
Figure3_PCA_composite.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
If you use this skill in a publication, please cite: