From sciagent-skills
Aggregates QC metrics from 150+ bioinformatics tools like FastQC, samtools, STAR, HISAT2 into interactive HTML reports for multi-sample NGS pipelines.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
MultiQC automatically searches directories for QC log files from 150+ bioinformatics tools and aggregates statistics across all samples into a single interactive HTML report. It parses outputs from FastQC, samtools flagstat, STAR, HISAT2, Trim Galore, Salmon, Kallisto, featureCounts, Picard, GATK, and many more — eliminating the need to manually review per-sample QC files. Reports include inter...
Aggregates QC reports from any bioinformatics tool outputs (FastQC, fastp, STAR, Picard, samtools, etc.) into a single MultiQC HTML report plus a ClawBio markdown summary with per-sample QC metrics.
Processes NGS data using deepTools: BAM to bigWig conversion, QC (correlation, PCA, fingerprints), heatmaps/profiles around TSS/peaks for ChIP-seq, RNA-seq, ATAC-seq.
Performs FASTQ preprocessing with fastp: auto-detects Illumina adapters, trims, quality-filters reads, corrects overlaps, generates HTML/JSON QC reports. Use before STAR, BWA-MEM2, Salmon in NGS pipelines.
Share bugs, ideas, or general feedback.
MultiQC automatically searches directories for QC log files from 150+ bioinformatics tools and aggregates statistics across all samples into a single interactive HTML report. It parses outputs from FastQC, samtools flagstat, STAR, HISAT2, Trim Galore, Salmon, Kallisto, featureCounts, Picard, GATK, and many more — eliminating the need to manually review per-sample QC files. Reports include interactive bar plots, scatter plots, heatmaps, and tables with configurable warnings and pass/fail thresholds.
multiqc.zip, samtools .flagstat, STAR Log.final.out, etc.) — MultiQC finds them automaticallyCheck before installing: The tool may already be available in the current environment (e.g., inside a
pixi/condaenv). Runcommand -v multiqcfirst and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool viapixi run multiqcrather than baremultiqc.
pip install multiqc
# Verify
multiqc --version
# MultiQC v1.25.0
# With conda (recommended for bioinformatics)
conda install -c bioconda multiqc
MultiQC aggregates existing output — first run your QC tools.
# FastQC on all FASTQ files
mkdir -p qc/fastqc
fastqc data/*.fastq.gz -o qc/fastqc/ -t 8
# samtools flagstat on all BAM files
for bam in results/*.bam; do
samtools flagstat $bam > qc/$(basename $bam .bam).flagstat
done
echo "QC files generated: $(ls qc/ | wc -l)"
MultiQC recursively scans for recognized QC files.
# Basic run: scan current directory recursively
multiqc .
# Specify output directory and report name
multiqc . -o reports/ -n project_qc_report
# Scan specific subdirectories only
multiqc qc/fastqc/ results/star/ logs/trimming/ -o reports/
# Output: reports/project_qc_report.html
echo "Report: reports/project_qc_report.html"
Use multiqc_config.yaml to set custom thresholds, sample naming, and module order.
# multiqc_config.yaml — place in working directory
title: "RNA-seq QC Report — Project X"
subtitle: "Analysis date: 2026-02"
intro_text: "Quality control summary for all 48 samples."
# Sample name cleaning: remove path prefixes and suffixes
fn_clean_exts:
- ".fastq.gz"
- "_R1"
- ".sorted"
# Thresholds for pass/warn/fail coloring
general_stats_addcols:
FastQC:
pct_duplication:
max: 40
warn: 30
# Module run order
module_order:
- fastqc
- trimgalore
- star
- featurecounts
- samtools
# Run with config file
multiqc . --config multiqc_config.yaml -o reports/
Control which tools and samples are included.
# Run only specific modules
multiqc . --module fastqc --module samtools
# Exclude specific modules
multiqc . --exclude fastqc
# Include only files matching a pattern
multiqc . --filename "*.flagstat" --filename "*_fastqc.zip"
# Ignore specific directories or files
multiqc . --ignore "tmp/" --ignore "*.bam"
# Add sample name regex substitution
multiqc . --replace-names "sample_" ""
Extract machine-readable statistics from the MultiQC report.
# Export data tables (CSV, JSON, YAML, TSV)
multiqc . -o reports/ --data-format json
# Generates: reports/multiqc_data/multiqc_data.json
# Export flat CSV tables per tool
multiqc . -o reports/ --export
ls reports/multiqc_data/
# multiqc_fastqc.txt, multiqc_samtools_stats.txt, ...
# Extract general stats as pandas DataFrame
python3 - << 'EOF'
import json
import pandas as pd
with open("reports/multiqc_data/multiqc_general_stats.json") as f:
data = json.load(f)
df = pd.DataFrame(data).T
print(df.head())
print(f"Shape: {df.shape}")
EOF
Integrate MultiQC as the final step of any QC pipeline.
#!/bin/bash
# Complete RNA-seq QC pipeline → MultiQC summary
SAMPLES=(ctrl_rep1 ctrl_rep2 treat_rep1 treat_rep2)
OUTDIR="pipeline_output"
mkdir -p $OUTDIR/{fastqc,star,featurecounts,flagstat}
for sample in "${SAMPLES[@]}"; do
# FastQC
fastqc data/${sample}.fastq.gz -o $OUTDIR/fastqc/ -t 4
# STAR alignment
STAR --runThreadN 8 --genomeDir refs/star_index \
--readFilesIn data/${sample}.fastq.gz \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix $OUTDIR/star/${sample}/
# samtools flagstat
samtools flagstat $OUTDIR/star/${sample}/Aligned.sortedByCoord.out.bam \
> $OUTDIR/flagstat/${sample}.flagstat
done
# Final MultiQC report
multiqc $OUTDIR/ -o $OUTDIR/qc_report/ -n "full_pipeline_qc"
echo "Report ready: $OUTDIR/qc_report/full_pipeline_qc.html"
| Parameter | Default | Range/Options | Effect |
|---|---|---|---|
-o, --outdir | . | directory path | Output directory for report and data |
-n, --filename | multiqc_report | any string | Report filename (without extension) |
-m, --module | all | tool name | Run only specified module(s) |
--ignore | — | glob pattern | Ignore matching files or directories |
--export | False | flag | Export flat tab-delimited data files |
--data-format | tsv | tsv, json, yaml | Format for exported data files |
--config | auto-detected | YAML file path | Custom config file with thresholds and naming |
--replace-names | — | regex, replacement | Clean sample names in report |
--fn_clean_exts | (built-in) | list in config | File extensions to strip from sample names |
--profile-runtime | False | flag | Show per-module runtime profiling |
# In Snakefile: collect all QC outputs, then run MultiQC
rule multiqc:
input:
expand("qc/fastqc/{sample}_fastqc.zip", sample=SAMPLES),
expand("qc/flagstat/{sample}.flagstat", sample=SAMPLES)
output:
html="reports/multiqc_report.html",
data=directory("reports/multiqc_data")
shell:
"multiqc qc/ -o reports/ -n multiqc_report"
import json
import pandas as pd
# Load general stats from JSON export
with open("reports/multiqc_data/multiqc_general_stats.json") as f:
stats = json.load(f)
df = pd.DataFrame(stats).T
print(f"Samples: {len(df)}")
print(f"Metrics: {list(df.columns[:5])}")
# Flag samples with low mapping rate
if "STAR_mqc-generalstats-star-uniquely_mapped_percent" in df.columns:
low_mapping = df[df["STAR_mqc-generalstats-star-uniquely_mapped_percent"] < 70]
print(f"Samples with <70% mapping: {list(low_mapping.index)}")
# Run FastQC on raw and trimmed reads, then combine in one report
mkdir -p qc/{raw,trimmed}
fastqc data/*.fastq.gz -o qc/raw/ -t 8
trim_galore data/*.fastq.gz --paired -o trimmed/
fastqc trimmed/*_trimmed.fastq.gz -o qc/trimmed/ -t 8
multiqc qc/raw/ qc/trimmed/ \
-o reports/ -n raw_vs_trimmed \
--dirs --dirs-depth 1 # use directory names in sample labels
| Output | Format | Description |
|---|---|---|
multiqc_report.html | HTML | Interactive report with all plots and tables |
multiqc_data/multiqc_general_stats.txt | TSV | Per-sample summary statistics (all tools) |
multiqc_data/multiqc_*.txt | TSV | Per-tool detailed statistics tables |
multiqc_data/multiqc_data.json | JSON | Full data (if --data-format json) |
multiqc_data/multiqc_sources.txt | TSV | Mapping of source files to samples |
| Problem | Cause | Solution |
|---|---|---|
| Empty report (no modules found) | QC files not in scanned directories | Specify directories explicitly: multiqc qc/ logs/ results/ |
| Wrong sample names in report | File extensions or paths not cleaned | Add fn_clean_exts to config or use --replace-names |
| Module missing from report | Log file format changed in tool version | Update MultiQC: pip install --upgrade multiqc; check GitHub issues |
| Duplicate sample names | Multiple files map to same sample name | Use --sample-names or fix fn_clean_exts in config |
| Report very slow to open | Too many samples (>500) in one report | Split by project or condition; use --flat for simpler rendering |
| FastQC data not parsed | FastQC ZIP not in expected location | Run MultiQC from root of project; ensure *_fastqc.zip files exist |
ModuleNotFoundError | Missing optional module dependencies | pip install multiqc[all] for all extras |