From bio-research
Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bio-research:nextflow-developmentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run nf-core bioinformatics pipelines on local or public sequencing data.
LICENSE.txtreferences/geo-sra-acquisition.mdreferences/installation.mdreferences/pipelines/atacseq.mdreferences/pipelines/rnaseq.mdreferences/pipelines/sarek.mdreferences/troubleshooting.mdscripts/check_environment.pyscripts/config/genomes.yamlscripts/config/pipelines/atacseq.yamlscripts/config/pipelines/rnaseq.yamlscripts/config/pipelines/sarek.yamlscripts/detect_data_type.pyscripts/generate_samplesheet.pyscripts/manage_genomes.pyscripts/sra_geo_fetch.pyscripts/utils/__init__.pyscripts/utils/file_discovery.pyscripts/utils/ncbi_utils.pyscripts/utils/sample_inference.pyRun nf-core bioinformatics pipelines on local or public sequencing data.
Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.
Lark-native execution (depth core: LARK-PATTERNS, LARK-RECIPES, LARK-FUSION). Nextflow/Docker runs stay local/HPC — Lark doesn't execute pipelines. Lark owns the collaboration around a multi-hour run: at the DECISION POINTS (Step 0/2/5) confirm choices via an interactive card (P4); when the run finishes (Step 6) upload the MultiQC report + key result tables to Drive (P8), log the run to the lab Base as system-of-record (P5), and notify the requester (P1). See "Step 7: Share results to Lark" below — runs are long, so always close the loop with a notification rather than expecting the user to watch the terminal.
- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs
- [ ] Step 7: Share results to Lark (Drive + Base run-log + notify card)
Skip this step if user has local FASTQ files.
For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.
Quick start:
# 1. Get study info
python scripts/sra_geo_fetch.py info GSE110004
# 2. Download (interactive mode)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i
# 3. Generate samplesheet
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv
DECISION POINT: After fetching study info, confirm with user:
Then continue to Step 1.
Run first. Pipeline will fail without passing environment.
python scripts/check_environment.py
All critical checks must pass. If any fail, provide fix instructions:
| Problem | Fix |
|---|---|
| Not installed | Install from https://docs.docker.com/get-docker/ |
| Permission denied | sudo usermod -aG docker $USER then re-login |
| Daemon not running | sudo systemctl start docker |
| Problem | Fix |
|---|---|
| Not installed | curl -s https://get.nextflow.io | bash && mv nextflow ~/bin/ |
| Version < 23.04 | nextflow self-update |
| Problem | Fix |
|---|---|
| Not installed / < 11 | sudo apt install openjdk-11-jdk |
Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.
DECISION POINT: Confirm with user before proceeding.
| Data Type | Pipeline | Version | Goal |
|---|---|---|---|
| RNA-seq | rnaseq | 3.22.2 | Gene expression |
| WGS/WES | sarek | 3.7.1 | Variant calling |
| ATAC-seq | atacseq | 2.1.2 | Chromatin accessibility |
Auto-detect from data:
python scripts/detect_data_type.py /path/to/data
For pipeline-specific details:
Validates environment with small data. MUST pass before real data.
nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output
| Pipeline | Command |
|---|---|
| rnaseq | nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq |
| sarek | nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek |
| atacseq | nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq |
Verify:
ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log
If test fails, see references/troubleshooting.md.
python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv
The script:
For sarek: Script prompts for tumor/normal status if not auto-detected.
python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>
rnaseq:
sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto
sarek:
patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0
atacseq:
sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1
python scripts/manage_genomes.py check <genome>
# If not installed:
python scripts/manage_genomes.py download <genome>
Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)
DECISION POINT: Confirm with user:
nextflow run nf-core/<pipeline> \
-r <version> \
-profile docker \
--input samplesheet.csv \
--outdir results \
--genome <genome> \
-resume
Key flags:
-r: Pin version-profile docker: Use Docker (or singularity for HPC)--genome: iGenomes key-resume: Continue from checkpointResource limits (if needed):
--max_cpus 8 --max_memory '32.GB' --max_time '24.h'
ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log
rnaseq:
results/star_salmon/salmon.merged.gene_counts.tsv - Gene countsresults/star_salmon/salmon.merged.gene_tpm.tsv - TPM valuessarek:
results/variant_calling/*/ - VCF filesresults/preprocessing/recalibrated/ - BAM filesatacseq:
results/macs2/narrowPeak/ - Peak callsresults/bwa/mergedLibrary/bigwig/ - Coverage tracksPipelines run for hours — close the loop through the lark MCP (apply P1/P2/P4/P5/P8):
lark_drive_upload for
results/multiqc/multiqc_report.html plus the headline result file for the pipeline (rnaseq:
salmon.merged.gene_counts.tsv; sarek: the key VCF; atacseq: the narrowPeak set). Keep the links.lark_base_record_upsert into the
Runs/Analyses table: {skill:"nextflow", pipeline, version, genome, n_samples, status:"success", multiqc_link:<Drive>, results_link:<Drive>, owner:<open_id>, started, finished}. dry_run: true
first (P2); read prior runs with lark_base_search — it does NOT support jq and REQUIRES
search_fields (which field(s) to match); narrow with select_fields/limit instead, and discover
field names via lark_api GET /open-apis/bitable/v1/apps/{base}/tables/{table}/fields if unknown
(P3). No Base yet → delegate to base-deploy; record/field ops → delegate to lark-base.lark_contact_search
(P1, or user_ids:"me"), then lark_im_card_send: header colored by outcome
(green success / red fail), div rows for pipeline + genome + sample count + runtime, an
actions button to the MultiQC report on Drive. print_json: true → dry_run: true → send.
Card grammar → delegate to lark-im. (For a one-line "pipeline done" ping, plain lark_im_send.).nextflow.log in the card and link the
troubleshooting reference instead of the report.For common exit codes and fixes, see references/troubleshooting.md.
nextflow run nf-core/<pipeline> -resume
This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.
It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.
Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.
When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).
npx claudepluginhub larkcowork/lark-cowork-plugins --plugin bio-researchCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.