Skill

nextflow-development

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/bio-research:nextflow-development

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run nf-core bioinformatics pipelines on local or public sequencing data.

Supporting Files

SKILL.md

325 lines · ~2.8k tokens

Stats

LanguagePython

Parent stars1

Parent forks9

MaintenanceGood

Last CommitJun 7, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

nf-core Pipeline Deployment

Run nf-core bioinformatics pipelines on local or public sequencing data.

Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.

Lark-native execution (depth core: LARK-PATTERNS, LARK-RECIPES, LARK-FUSION). Nextflow/Docker runs stay local/HPC — Lark doesn't execute pipelines. Lark owns the collaboration around a multi-hour run: at the DECISION POINTS (Step 0/2/5) confirm choices via an interactive card (P4); when the run finishes (Step 6) upload the MultiQC report + key result tables to Drive (P8), log the run to the lab Base as system-of-record (P5), and notify the requester (P1). See "Step 7: Share results to Lark" below — runs are long, so always close the loop with a notification rather than expecting the user to watch the terminal.

Workflow Checklist

- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs
- [ ] Step 7: Share results to Lark (Drive + Base run-log + notify card)

Step 0: Acquire Data (GEO/SRA Only)

Skip this step if user has local FASTQ files.

For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.

Quick start:

# 1. Get study info
python scripts/sra_geo_fetch.py info GSE110004

# 2. Download (interactive mode)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# 3. Generate samplesheet
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

DECISION POINT: After fetching study info, confirm with user:

Which sample subset to download (if multiple data types)
Suggested genome and pipeline

Then continue to Step 1.

Step 1: Environment Check

Run first. Pipeline will fail without passing environment.

python scripts/check_environment.py

All critical checks must pass. If any fail, provide fix instructions:

Docker issues

Problem	Fix
Not installed	Install from https://docs.docker.com/get-docker/
Permission denied	`sudo usermod -aG docker $USER` then re-login
Daemon not running	`sudo systemctl start docker`

Nextflow issues

Problem	Fix
Not installed	`curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
Version < 23.04	`nextflow self-update`

Java issues

Problem	Fix
Not installed / < 11	`sudo apt install openjdk-11-jdk`

Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.

Step 2: Select Pipeline

DECISION POINT: Confirm with user before proceeding.

Data Type	Pipeline	Version	Goal
RNA-seq	`rnaseq`	3.22.2	Gene expression
WGS/WES	`sarek`	3.7.1	Variant calling
ATAC-seq	`atacseq`	2.1.2	Chromatin accessibility

Auto-detect from data:

python scripts/detect_data_type.py /path/to/data

For pipeline-specific details:

Step 3: Run Test Profile

Validates environment with small data. MUST pass before real data.

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

Pipeline	Command
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

Verify:

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

If test fails, see references/troubleshooting.md.

Step 4: Create Samplesheet

Generate automatically

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

The script:

Discovers FASTQ/BAM/CRAM files
Pairs R1/R2 reads
Infers sample metadata
Validates before writing

For sarek: Script prompts for tumor/normal status if not auto-detected.

Validate existing samplesheet

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

Samplesheet formats

rnaseq:

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek:

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq:

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

Step 5: Configure & Run

5a. Check genome availability

python scripts/manage_genomes.py check <genome>
# If not installed:
python scripts/manage_genomes.py download <genome>

Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)

5b. Decision points

DECISION POINT: Confirm with user:

Genome: Which reference to use
Pipeline-specific options:
- rnaseq: aligner (star_salmon recommended, hisat2 for low memory)
- sarek: tools (haplotypecaller for germline, mutect2 for somatic)
- atacseq: read_length (50, 75, 100, or 150)

5c. Run pipeline

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

Key flags:

-r: Pin version
-profile docker: Use Docker (or singularity for HPC)
--genome: iGenomes key
-resume: Continue from checkpoint

Resource limits (if needed):

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

Step 6: Verify Outputs

Check completion

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

Key outputs by pipeline

rnaseq:

results/star_salmon/salmon.merged.gene_counts.tsv - Gene counts
results/star_salmon/salmon.merged.gene_tpm.tsv - TPM values

sarek:

results/variant_calling/*/ - VCF files
results/preprocessing/recalibrated/ - BAM files

atacseq:

results/macs2/narrowPeak/ - Peak calls
results/bwa/mergedLibrary/bigwig/ - Coverage tracks

Step 7: Share Results to Lark

Pipelines run for hours — close the loop through the lark MCP (apply P1/P2/P4/P5/P8):

Upload artifacts to Drive (P8) — lark_drive_upload for results/multiqc/multiqc_report.html plus the headline result file for the pipeline (rnaseq: salmon.merged.gene_counts.tsv; sarek: the key VCF; atacseq: the narrowPeak set). Keep the links.
Log the run to the lab Base (P5, system-of-record) — lark_base_record_upsert into the Runs/Analyses table: {skill:"nextflow", pipeline, version, genome, n_samples, status:"success", multiqc_link:<Drive>, results_link:<Drive>, owner:<open_id>, started, finished}. dry_run: true first (P2); read prior runs with lark_base_search — it does NOT support jq and REQUIRES search_fields (which field(s) to match); narrow with select_fields/limit instead, and discover field names via lark_api GET /open-apis/bitable/v1/apps/{base}/tables/{table}/fields if unknown (P3). No Base yet → delegate to base-deploy; record/field ops → delegate to lark-base.
Notify the requester with a completion card (P4) — resolve them via lark_contact_search (P1, or user_ids:"me"), then lark_im_card_send: header colored by outcome (green success / red fail), div rows for pipeline + genome + sample count + runtime, an actions button to the MultiQC report on Drive. print_json: true → dry_run: true → send. Card grammar → delegate to lark-im. (For a one-line "pipeline done" ping, plain lark_im_send.)
On failure, post the failing process + exit code from .nextflow.log in the card and link the troubleshooting reference instead of the report.

Quick Reference

For common exit codes and fixes, see references/troubleshooting.md.

Resume failed run

nextflow run nf-core/<pipeline> -resume

References

references/geo-sra-acquisition.md - Downloading public GEO/SRA data
references/troubleshooting.md - Common issues and fixes
references/installation.md - Environment setup
references/pipelines/rnaseq.md - RNA-seq pipeline details
references/pipelines/sarek.md - Variant calling details
references/pipelines/atacseq.md - ATAC-seq details

Disclaimer

This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.

It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.

Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.

Attribution

When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).

Licenses

nf-core pipelines: MIT License (https://nf-co.re/about)
Nextflow: Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)
NCBI SRA Toolkit: Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)

nextflow-development

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

nextflow-development

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

nf-core Pipeline Deployment

Workflow Checklist

Step 0: Acquire Data (GEO/SRA Only)

Step 1: Environment Check

Docker issues

Nextflow issues

Java issues

Step 2: Select Pipeline

Step 3: Run Test Profile

Step 4: Create Samplesheet

Generate automatically

Validate existing samplesheet

Samplesheet formats

Step 5: Configure & Run

5a. Check genome availability

5b. Decision points

5c. Run pipeline

Step 6: Verify Outputs

Check completion

Key outputs by pipeline

Step 7: Share Results to Lark

Quick Reference

Resume failed run

References

Disclaimer

Attribution

Licenses

Similar Skills

nf-core Pipeline Deployment

Workflow Checklist

Step 0: Acquire Data (GEO/SRA Only)

Step 1: Environment Check

Docker issues

Nextflow issues

Java issues

Step 2: Select Pipeline

Step 3: Run Test Profile

Step 4: Create Samplesheet

Generate automatically

Validate existing samplesheet

Samplesheet formats

Step 5: Configure & Run

5a. Check genome availability

5b. Decision points

5c. Run pipeline

Step 6: Verify Outputs

Check completion

Key outputs by pipeline

Step 7: Share Results to Lark

Quick Reference

Resume failed run

References

Disclaimer

Attribution

Licenses

Similar Skills