Performs QC on single-cell RNA-seq data (.h5ad/.h5) using scanpy/scverse: MAD-based filtering, metrics, and visualizations for low-quality cell removal.
From bio-researchnpx claudepluginhub 8gg-git/knowledge-work-plugins --plugin bio-researchThis skill uses the workspace's default tool permissions.
LICENSE.txtreferences/scverse_qc_guidelines.mdscripts/qc_analysis.pyscripts/qc_core.pyscripts/qc_plotting.pyDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Automated QC workflow for single-cell RNA-seq data following scverse best practices.
Use when users:
Supported input formats:
.h5ad files (AnnData format from scanpy/Python workflows).h5 files (10X Genomics Cell Ranger output)Default recommendation: Use Approach 1 (complete pipeline) unless the user has specific custom requirements or explicitly requests non-standard filtering logic.
For standard QC following scverse best practices, use the convenience script scripts/qc_analysis.py:
python3 scripts/qc_analysis.py input.h5ad
# or for 10X Genomics .h5 files:
python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5
The script automatically detects the file format and loads it appropriately.
When to use this approach:
Requirements: anndata, scanpy, scipy, matplotlib, seaborn, numpy
Parameters:
Customize filtering thresholds and gene patterns using command-line parameters:
--output-dir - Output directory--mad-counts, --mad-genes, --mad-mt - MAD thresholds for counts/genes/MT%--mt-threshold - Hard mitochondrial % cutoff--min-cells - Gene filtering threshold--mt-pattern, --ribo-pattern, --hb-pattern - Gene name patterns for different speciesUse --help to see current default values.
Outputs:
All files are saved to <input_basename>_qc_results/ directory by default (or to the directory specified by --output-dir):
qc_metrics_before_filtering.png - Pre-filtering visualizationsqc_filtering_thresholds.png - MAD-based threshold overlaysqc_metrics_after_filtering.png - Post-filtering quality metrics<input_basename>_filtered.h5ad - Clean, filtered dataset ready for downstream analysis<input_basename>_with_qc.h5ad - Original data with QC annotations preservedIf copying outputs for user access, copy individual files (not the entire directory) so users can preview them directly.
The script performs the following steps:
For custom analysis workflows or non-standard requirements, use the modular utility functions from scripts/qc_core.py and scripts/qc_plotting.py:
# Run from scripts/ directory, or add scripts/ to sys.path if needed
import anndata as ad
from qc_core import calculate_qc_metrics, detect_outliers_mad, filter_cells
from qc_plotting import plot_qc_distributions # Only if visualization needed
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# ... custom analysis logic here
When to use this approach:
Available utility functions:
From qc_core.py (core QC operations):
calculate_qc_metrics(adata, mt_pattern, ribo_pattern, hb_pattern, inplace=True) - Calculate QC metrics and annotate adatadetect_outliers_mad(adata, metric, n_mads, verbose=True) - MAD-based outlier detection, returns boolean maskapply_hard_threshold(adata, metric, threshold, operator='>', verbose=True) - Apply hard cutoffs, returns boolean maskfilter_cells(adata, mask, inplace=False) - Apply boolean mask to filter cellsfilter_genes(adata, min_cells=20, min_counts=None, inplace=True) - Filter genes by detectionprint_qc_summary(adata, label='') - Print summary statisticsFrom qc_plotting.py (visualization):
plot_qc_distributions(adata, output_path, title) - Generate comprehensive QC plotsplot_filtering_thresholds(adata, outlier_masks, thresholds, output_path) - Visualize filtering thresholdsplot_qc_after_filtering(adata, output_path) - Generate post-filtering plotsExample custom workflows:
Example 1: Only calculate metrics and visualize, don't filter yet
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
plot_qc_distributions(adata, 'qc_before.png', title='Initial QC')
print_qc_summary(adata, label='Before filtering')
Example 2: Apply only MT% filtering, keep other metrics permissive
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Only filter high MT% cells
high_mt = apply_hard_threshold(adata, 'pct_counts_mt', 10, operator='>')
adata_filtered = filter_cells(adata, ~high_mt)
adata_filtered.write('filtered.h5ad')
Example 3: Different thresholds for different subsets
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Apply type-specific QC (assumes cell_type metadata exists)
neurons = adata.obs['cell_type'] == 'neuron'
other_cells = ~neurons
# Neurons tolerate higher MT%, other cells use stricter threshold
neuron_qc = apply_hard_threshold(adata[neurons], 'pct_counts_mt', 15, operator='>')
other_qc = apply_hard_threshold(adata[other_cells], 'pct_counts_mt', 8, operator='>')
For detailed QC methodology, parameter rationale, and troubleshooting guidance, see references/scverse_qc_guidelines.md. This reference provides:
Load this reference when users need deeper understanding of the methodology or when troubleshooting QC issues.
Typical downstream analysis steps: