Help us improve
Share bugs, ideas, or general feedback.
From clawbio
Runs nf-core/rnaseq bulk RNA-seq preprocessing from FASTQ or BAM inputs with preflight checks, reproducible outputs, and downstream handoff to DE skills.
npx claudepluginhub clawbio/clawbio --plugin clawbioHow this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:nfcore-rnaseq-wrapperThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are **nfcore-rnaseq-wrapper**, a specialised ClawBio agent for upstream bulk RNA-seq preprocessing from FASTQ or BAM inputs using `nf-core/rnaseq`.
README.mdcommand_builder.pydemo/README.mderrors.pyexecutor.pynfcore_rnaseq_wrapper.pyoutputs_parser.pyparams_builder.pypipeline_source.pypreflight.pyprovenance.pyremap_paths.pyreporting.pyreproducibility/compatibility_policy.jsonreproducibility/pinned_versions.jsonsamplesheet_builder.pyschemas.pytests/conftest.pytests/test_command_builder.pytests/test_error_codes.pyExecutes ENCODE RNA-seq pipeline from FASTQ to gene quantification and signal tracks using Nextflow, STAR alignment, RSEM/Kallisto, with Docker and cloud deployment.
Wraps nf-core/scrnaseq to preprocess single-cell RNA-seq from FASTQ to h5ad with preflight checks and reproducibility. For upstream scRNA preprocessing.
Deploys nf-core pipelines (rnaseq, sarek, atacseq) for RNA-seq, WGS/WES, ATAC-seq analysis using local FASTQs or GEO/SRA data, with env checks and samplesheets.
Share bugs, ideas, or general feedback.
You are nfcore-rnaseq-wrapper, a specialised ClawBio agent for upstream bulk RNA-seq preprocessing from FASTQ or BAM inputs using nf-core/rnaseq.
Fire when:
nf-core/rnaseqDo NOT fire when:
rnaseq-de.h5ad -> route to nfcore-scrnaseq-wrapperscrna-orchestratorOne skill, one task: run upstream bulk RNA-seq preprocessing through nf-core/rnaseq and produce count-matrix handoff artifacts for downstream ClawBio skills.
This skill does not perform differential expression. It emits a prefilled rnaseq-de command template when merged counts are available.
nf-core/rnaseq v3.26.0 through -params-file with deterministic work/result directories.commands.sh, params.yaml, manifest.json, checksums, environment.yml, and seven provenance JSON files.python clawbio.py run rnaseq --counts ... when a merged count matrix is available.--aligner | Route | Quantification output | Best for |
|---|---|---|---|
star_salmon (default) | STAR alignment + Salmon quantification | merged TSV count matrices + SummarizedExperiment.rds | Standard human/mouse bulk RNA-seq with high mapping accuracy |
star_rsem | STAR alignment + RSEM quantification | per-sample *.genes.results + merged matrix + RDS | Encode-style isoform-level analyses |
hisat2 | HISAT2 alignment only (no quantification) | BAM only — handoff_available=false unless --pseudo-aligner is also set | Alignment-only workflows; add --pseudo-aligner salmon to re-enable downstream DE handoff |
bowtie2_salmon | Bowtie2 alignment + Salmon quantification | merged TSV count matrices + RDS | Prokaryotic transcriptomes (combine with --prokaryotic) |
A pseudo-aligner (--pseudo-aligner salmon or --pseudo-aligner kallisto) runs alongside
--aligner unless paired with --skip-alignment. Each route may use either --genome <iGenomes>
or explicit --fasta/--gtf/--gff plus optional pre-built --*-index paths — never both.
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| Samplesheet | .csv | sample, fastq_1, strandedness; optional fastq_2 | samplesheet.csv |
| BAM reprocessing samplesheet | .csv | sample, strandedness, plus genome_bam and/or transcriptome_bam (wrapper adds empty fastq_1 column to satisfy nf-core schema — you do not need to supply it) | bam_samplesheet.csv |
| Demo mode | n/a | none | python clawbio.py run rnaseq-pipeline --demo |
../rnaseq, or remote nf-core/rnaseq at the pinned version.reproducibility/params.yaml.report.md, result.json, provenance JSON, checksums, and replay commands.rnaseq-de command template using preferred_counts_tsv.# Preflight only; no Nextflow execution
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_check --check \
--genome GRCh38
# Demo mode using upstream test profile
python clawbio.py run rnaseq-pipeline --demo --output ./rnaseq_demo
# STAR + Salmon default route
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_run \
--aligner star_salmon --genome GRCh38
# Explicit FASTA/GTF reference
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_run \
--fasta /refs/genome.fa --gtf /refs/genes.gtf
# RSEM route
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rsem_run \
--aligner star_rsem --genome GRCh38
# Contaminant screening with Kraken2 + Bracken
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_run \
--genome GRCh38 \
--contaminant-screening kraken2_bracken \
--kraken-db /refs/kraken2_db --bracken-precision G
# Auto-handoff to rnaseq-de when all flags are provided
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_run \
--genome GRCh38 --run-downstream \
--metadata metadata.csv --formula "~ batch + condition" \
--contrast "condition,treated,control"
# Prokaryotic transcriptomes via Bowtie2+Salmon
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./prok_run \
--aligner bowtie2_salmon --fasta /refs/genome.fa --gtf /refs/genes.gtf \
--profile docker --prokaryotic
# ARM architecture (Apple M-series, AWS Graviton) — composes -profile docker,arm64
python clawbio.py run rnaseq-pipeline \
--input samplesheet.csv --output ./rnaseq_arm \
--genome GRCh38 --profile docker --arm
# BAM reprocessing from nf-core samplesheet_with_bams.csv output
python clawbio.py run rnaseq-pipeline \
--input results/samplesheets/samplesheet_with_bams.csv \
--output ./rnaseq_reprocess \
--skip-alignment
python clawbio.py run rnaseq-pipeline --demo --output /tmp/rnaseq_demo
Expected output: upstream nf-core/rnaseq test profile outputs plus ClawBio report.md, result.json, provenance/, and reproducibility/.
The wrapper uses a gated 7-step flow. A failure raises a structured SkillError with stage, error_code, message, fix, and details, then exits non-zero.
Key methods:
params.input is written as a whitespace-free relative path under the output directory to satisfy the upstream ^\S+\.csv$ schema.--genome, --fasta --gtf, or --fasta --gff.--genome is mutually exclusive with explicit reference paths.handoff_available=false.rnaseq-de.# nf-core/rnaseq Wrapper Report
## Summary
- Aligner: `star_salmon`
- Samples: `5`
## Outputs
- Preferred counts TSV: `/run/upstream/results/star_salmon/salmon.merged.gene_counts_length_scaled.tsv`
- MultiQC report: `/run/upstream/results/multiqc/star_salmon/multiqc_report.html`
## Next Steps
python clawbio.py run rnaseq --counts <preferred_counts_tsv> --metadata <your_metadata.csv> ...
output/
├── report.md
├── result.json
├── logs/
├── upstream/
│ ├── results/
│ │ ├── samplesheets/
│ │ │ └── samplesheet_with_bams.csv # generated when alignment runs; use with --skip-alignment for BAM reprocessing
│ │ ├── star_salmon/ # star_salmon aligner outputs
│ │ │ ├── *.markdup.sorted.bam # sorted, deduplicated BAMs (one per sample)
│ │ │ ├── log/ # STAR alignment logs (*.Log.final.out, *.SJ.out.tab)
│ │ │ ├── salmon.merged.*.tsv # merged gene/transcript count matrices
│ │ │ └── salmon.merged.*.rds # SummarizedExperiment objects
│ │ └── ...
│ └── work/
├── provenance/
└── reproducibility/
├── samplesheet.valid.csv # demo run → samplesheet.demo.csv; test profile → samplesheet.noinput.csv
├── params.yaml
├── commands.sh
├── remap_paths.py
├── manifest.json
├── environment.yml
└── checksums.sha256
Required
strandedness is required per row and must be auto, forward, reverse, or unstranded..fq, .fastq, .fq.gz, or .fastq.gz (all four are accepted by the nf-core/rnaseq schema). Only the basename must be whitespace-free; parent directory paths may contain spaces.--genome cannot be mixed with --fasta, --gtf, --gff, or index paths. Names not in the built-in iGenomes catalogue emit a preflight warning but do not block execution — this is expected when using a user-defined genome catalogue (pass it via --nextflow-config my_genomes.config). If you intended an iGenomes entry, check the exact spelling and case (e.g. GRCh38, GRCm38).--skip-quantification-merge prevents downstream rnaseq-de handoff because no merged matrix exists.--aligner hisat2 is alignment-only for this handoff contract.--with-umi requires a barcode pattern unless --skip-umi-extract is set./tmp.--prokaryotic, --rapid-quant, and --arm are profile-modifier flags. They append prokaryotic, rapid_quant, or arm64 to the Nextflow -profile string by composing it with the execution backend. Use --profile docker --prokaryotic (composes -profile docker,prokaryotic). --arm composes arm64 as an architecture modifier (-profile docker,arm64) and also writes arm: true to params.yaml — arm is a real hidden boolean parameter in the nf-core/rnaseq 3.26.0 schema ("Use ARM architecture containers.").fastq_1 column in your input file; the wrapper normalizes by adding an empty fastq_1 column (value "") to the validated output samplesheet, satisfying the official nf-core schema which requires fastq_1 in every row. The nf-core samplesheet_with_bams.csv output (which contains both FASTQ and BAM columns) can be used as input only with --skip-alignment — without it, mixed rows are rejected.rnaseq-de only launches when --run-downstream, --metadata, --formula, and --contrast are all provided. Without all four, only a template reproducibility/rnaseq_de_handoff.sh is written.--rseqc-modules runs a default set of 7 modules. The tin module (Transcript Integrity Number) is omitted from the default because it is very slow on large BAM files. Add it explicitly: --rseqc-modules bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication,tin.--rsem-extra-args is parsed and stored for provenance only; it has no effect on the Nextflow run. nf-core/rnaseq ≥3.14 removed extra_rsem_quant_args from the schema. Passing extra RSEM args requires a custom Nextflow config passed via --nextflow-config my_rsem.config.skip_preseq is true by default in nf-core/rnaseq (Preseq library complexity estimation is skipped). Use the wrapper flag --enable-preseq to opt in; this sets skip_preseq: false in params.yaml. Note: --enable-preseq is a wrapper-only flag that inverts the nf-core boolean — it cannot be passed directly to Nextflow.--profile mamba is equivalent to --profile conda — both use a conda-compatible backend. The wrapper accepts either spelling.--kallisto-quant-fraglen and --kallisto-quant-fraglen-sd only apply to single-end Kallisto runs. Both nf-core/rnaseq pipeline defaults are 200; omit these flags for paired-end data. Preflight validates --kallisto-quant-fraglen ≥ 1 and --kallisto-quant-fraglen-sd ≥ 0.--min-trimmed-reads must be ≥ 0 (pipeline default: 10000). Preflight rejects negative values. The nf-core schema does not define a minimum for this parameter; the wrapper enforces ≥ 0 as a sensible bound.params.yaml when the user does not set them: umitools_extract_method (pipeline default: string), umi_dedup_tool (pipeline default: umitools), gtf_extra_attributes (pipeline default: gene_name), gtf_group_features (pipeline default: gene_id), and extra_fqlint_args (pipeline default: --disable-validator P001). Writing the current pipeline default explicitly would silently override any future pipeline upgrade that changes that default, defeating the point of pinning to a versioned pipeline. If you need to lock a value, pass it explicitly; otherwise the pipeline applies its own built-in default at runtime.test, test_full, test_prokaryotic, test_full_aws, test_full_gcp, test_full_azure, test_gpu) ship with params.input in their profile config and do not require --input. The wrapper detects these profile tokens and skips the input requirement and reference check. test_full* profiles use genome='GRCh37' via iGenomes — the wrapper does not set igenomes_ignore: true for these, letting the profile config control it. --demo is a different mechanism: it forces star_salmon, adds test to the Nextflow profile, writes a samplesheet.demo.csv stub, and clears all reference/index flags (--genome, --igenomes-base, --fasta, --gtf, --gff, --transcript-fasta, --additional-fasta, --gene-bed, --splicesites, and all --*-index flags) before they reach params.yaml — the test profile bundles sample FASTQs paired with its own reference data, and a partial override would silently desynchronise samples from refs. Self-contained test profile runs produce samplesheet.noinput.csv instead so provenance audits can distinguish them. The debug profile only sets debug logging flags (dumpHashes, cleanup=false) and does not provide params.input — it still requires --input.--resume is rejected when pipeline source, profile, aligner, pseudo-aligner, or params checksum drift.ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.
Use this skill to produce upstream bulk RNA-seq preprocessing outputs. Route downstream differential expression, contrasts, volcano plots, and PCA interpretation to rnaseq-de and diff-visualizer.
rnaseq-de: bulk/pseudo-bulk differential expression from preferred_counts_tsvdiff-visualizer: plots from downstream DE resultsmultiqc-reporter: optional QC aggregation/reporting follow-upPinned upstream: nf-core/rnaseq v3.26.0. Before changing the default version, audit nextflow.config, assets/schema_input.json, nextflow_schema.json, docs/output.md, and changed module configs, then update tests and reproducibility/pinned_versions.json.