From bio-research
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bio-research:single-cell-rna-qcThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Automated QC workflow for single-cell RNA-seq data following scverse best practices.
Automated QC workflow for single-cell RNA-seq data following scverse best practices.
Lark-native execution (depth core: LARK-PATTERNS, LARK-RECIPES, LARK-FUSION). The QC compute stays local (scanpy/anndata via the scripts below) — Lark does not run the analysis. What becomes Lark-native is the delivery: share the QC report PNGs and filtered
.h5adto Drive, log the run to the lab Base as system-of-record, and announce the result as an interactive card. See "Share results to Lark" at the end.
Use when users:
Supported input formats:
.h5ad files (AnnData format from scanpy/Python workflows).h5 files (10X Genomics Cell Ranger output)Default recommendation: Use Approach 1 (complete pipeline) unless the user has specific custom requirements or explicitly requests non-standard filtering logic.
For standard QC following scverse best practices, use the convenience script scripts/qc_analysis.py:
python3 scripts/qc_analysis.py input.h5ad
# or for 10X Genomics .h5 files:
python3 scripts/qc_analysis.py raw_feature_bc_matrix.h5
The script automatically detects the file format and loads it appropriately.
When to use this approach:
Requirements: anndata, scanpy, scipy, matplotlib, seaborn, numpy
Parameters:
Customize filtering thresholds and gene patterns using command-line parameters:
--output-dir - Output directory--mad-counts, --mad-genes, --mad-mt - MAD thresholds for counts/genes/MT%--mt-threshold - Hard mitochondrial % cutoff--min-cells - Gene filtering threshold--mt-pattern, --ribo-pattern, --hb-pattern - Gene name patterns for different speciesUse --help to see current default values.
Outputs:
All files are saved to <input_basename>_qc_results/ directory by default (or to the directory specified by --output-dir):
qc_metrics_before_filtering.png - Pre-filtering visualizationsqc_filtering_thresholds.png - MAD-based threshold overlaysqc_metrics_after_filtering.png - Post-filtering quality metrics<input_basename>_filtered.h5ad - Clean, filtered dataset ready for downstream analysis<input_basename>_with_qc.h5ad - Original data with QC annotations preservedIf copying outputs for user access, copy individual files (not the entire directory) so users can preview them directly.
The script performs the following steps:
For custom analysis workflows or non-standard requirements, use the modular utility functions from scripts/qc_core.py and scripts/qc_plotting.py:
# Run from scripts/ directory, or add scripts/ to sys.path if needed
import anndata as ad
from qc_core import calculate_qc_metrics, detect_outliers_mad, filter_cells
from qc_plotting import plot_qc_distributions # Only if visualization needed
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# ... custom analysis logic here
When to use this approach:
Available utility functions:
From qc_core.py (core QC operations):
calculate_qc_metrics(adata, mt_pattern, ribo_pattern, hb_pattern, inplace=True) - Calculate QC metrics and annotate adatadetect_outliers_mad(adata, metric, n_mads, verbose=True) - MAD-based outlier detection, returns boolean maskapply_hard_threshold(adata, metric, threshold, operator='>', verbose=True) - Apply hard cutoffs, returns boolean maskfilter_cells(adata, mask, inplace=False) - Apply boolean mask to filter cellsfilter_genes(adata, min_cells=20, min_counts=None, inplace=True) - Filter genes by detectionprint_qc_summary(adata, label='') - Print summary statisticsFrom qc_plotting.py (visualization):
plot_qc_distributions(adata, output_path, title) - Generate comprehensive QC plotsplot_filtering_thresholds(adata, outlier_masks, thresholds, output_path) - Visualize filtering thresholdsplot_qc_after_filtering(adata, output_path) - Generate post-filtering plotsExample custom workflows:
Example 1: Only calculate metrics and visualize, don't filter yet
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
plot_qc_distributions(adata, 'qc_before.png', title='Initial QC')
print_qc_summary(adata, label='Before filtering')
Example 2: Apply only MT% filtering, keep other metrics permissive
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Only filter high MT% cells
high_mt = apply_hard_threshold(adata, 'pct_counts_mt', 10, operator='>')
adata_filtered = filter_cells(adata, ~high_mt)
adata_filtered.write('filtered.h5ad')
Example 3: Different thresholds for different subsets
adata = ad.read_h5ad('input.h5ad')
calculate_qc_metrics(adata, inplace=True)
# Apply type-specific QC (assumes cell_type metadata exists)
neurons = adata.obs['cell_type'] == 'neuron'
other_cells = ~neurons
# Neurons tolerate higher MT%, other cells use stricter threshold
neuron_qc = apply_hard_threshold(adata[neurons], 'pct_counts_mt', 15, operator='>')
other_qc = apply_hard_threshold(adata[other_cells], 'pct_counts_mt', 8, operator='>')
For detailed QC methodology, parameter rationale, and troubleshooting guidance, see references/scverse_qc_guidelines.md. This reference provides:
Load this reference when users need deeper understanding of the methodology or when troubleshooting QC issues.
After the QC run completes locally, deliver the outputs through the lark MCP (apply P1/P2/P3/P4/P5/P8):
lark_drive_upload for the three QC PNGs
(qc_metrics_before_filtering.png, qc_filtering_thresholds.png,
qc_metrics_after_filtering.png) and the *_filtered.h5ad. Upload individual files (not the whole
dir) so they preview inline. Keep the returned file tokens/links for the next two steps.lark_base_record_upsert
(base_token, table_id of the Runs/Analyses table) with fields like
{skill:"single-cell-rna-qc", input:<basename>, cells_before, cells_after, pct_filtered, mt_threshold, output_link:<Drive link>, owner:<open_id>, status:"done"}. dry_run: true first
(P2). Read prior runs with lark_base_search — it does NOT support jq and REQUIRES
search_fields (which field(s) to match per the Bitable API); narrow with select_fields/limit
instead, and if the field names are unknown discover them via lark_api GET /open-apis/bitable/v1/apps/{base}/tables/{table}/fields first (P3). If no lab Base exists yet,
delegate to base-deploy; for record/field details delegate to lark-base.lark_im_card_send with a short header (e.g.
"scRNA QC: 8,412 → 7,930 cells (5.7% filtered)"), div rows for key metrics, an image/link to
the before/after plot, and an actions button linking to the Drive report. Validate with
print_json: true, then dry_run: true, then send. For card grammar delegate to lark-im.lark_contact_search(name)
→ open_id; user_ids:"me" for your own thread. Plain lark_im_send only for a one-line ping.Typical downstream analysis steps:
npx claudepluginhub larkcowork/lark-cowork-plugins --plugin bio-researchCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.