From mims-harvard-tooluniverse
Analyzes pooled/arrayed CRISPR screens for essential genes, synthetic lethals, and drug targets via sgRNA processing, MAGeCK/BAGEL scoring, QC, normalization, and enrichment.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
Comprehensive skill for analyzing CRISPR-Cas9 genetic screens to identify essential genes, synthetic lethal interactions, and therapeutic targets through robust statistical analysis and pathway enrichment.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Comprehensive skill for analyzing CRISPR-Cas9 genetic screens to identify essential genes, synthetic lethal interactions, and therapeutic targets through robust statistical analysis and pathway enrichment.
CRISPR screens enable genome-wide functional genomics by systematically perturbing genes and measuring fitness effects. This skill provides an 8-phase workflow for:
Load sgRNA count matrix (MAGeCK format or generic TSV). Expected columns: sgRNA, Gene, plus sample columns. Create experimental design table linking samples to conditions (baseline/treatment) with replicate assignments.
Assess sgRNA distribution quality:
Normalize sgRNA counts to account for library size differences:
Calculate log2 fold changes (LFC) between treatment and control conditions with pseudocount.
Two scoring approaches:
Compare essentiality scores between wildtype and mutant cell lines:
Query DepMap/literature for known dependencies using PubMed search.
Submit top essential genes to Enrichr for pathway enrichment:
Composite scoring combining:
Query DGIdb for each candidate gene to find existing drugs, interaction types, and sources.
Generate markdown report with:
Key Tools Used:
PubMed_search_articles - Literature search for gene essentiality and drug resistanceReactomeAnalysis_pathway_enrichment - Pathway enrichment (param: identifiers newline-separated, page_size)enrichr_gene_enrichment_analysis - Enrichr enrichment (param: gene_list array, libs array)DGIdb_get_drug_gene_interactions - Drug-gene interactions (param: genes as array)DGIdb_get_gene_druggability - Druggability categoriesSTRING_get_network - Protein interaction networkskegg_search_pathway - Pathway search by keywordkegg_get_pathway_info - Pathway details by IDCancer Context (essential for drug resistance screens):
civic_search_evidence_items - Clinical evidence for drug resistance/sensitivityCOSMIC_get_mutations_by_gene - Somatic mutation landscapecBioPortal_get_mutations - Mutations in specific cancer cohortsChEMBL_search_targets - Structural druggability assessmentExpression & Variant Integration:
GEO_search_rnaseq_datasets / geo_search_datasets - Expression datasetsClinVar_search_variants - Known pathogenic variantsgnomad_get_gene_constraints - Gene constraint metrics (pLI, oe_lof)UniProt_get_function_by_accession - Protein function for hit validationimport pandas as pd
from tooluniverse import ToolUniverse
# 1. Load data
counts, meta = load_sgrna_counts("sgrna_counts.txt")
design = create_design_matrix(['T0_1', 'T0_2', 'T14_1', 'T14_2'],
['baseline', 'baseline', 'treatment', 'treatment'])
# 2. Process
filtered_counts, filtered_mapping = filter_low_count_sgrnas(counts, meta['sgrna_to_gene'])
norm_counts, _ = normalize_counts(filtered_counts)
lfc, _, _ = calculate_lfc(norm_counts, design)
# 3. Score genes
gene_scores = mageck_gene_scoring(lfc, filtered_mapping)
# 4. Enrich pathways
enrichment = enrich_essential_genes(gene_scores, top_n=100)
# 5. Find drug targets
drug_targets = prioritize_drug_targets(gene_scores)
# 6. Generate report
report = generate_crispr_report(gene_scores, enrichment, drug_targets)
Screen hits are statistical findings, not direct readouts of biological relevance. A gene scoring as essential might be essential for cell growth in general (housekeeping) or essential specifically for the phenotype you are screening for (interesting). Always compare your screen hits to public essentiality data — use DepMap pan-cancer dependency scores to filter genes that are broadly essential across all cell lines. A gene essential only in your specific context, but not pan-essential in DepMap, is a better candidate for follow-up than one that scores in every screen.
LOOK UP DON'T GUESS: DepMap dependency scores, known core essential gene sets (Hart et al., Blomen et al.), and DGIdb druggability data for your top hits. Do not assume a hit is context-specific without checking public essentiality databases.
| Evidence Grade | Criteria | Validation Priority |
|---|---|---|
| A -- Strong hit | MAGeCK RRA p < 0.001, BAGEL BF > 5, >=3 sgRNAs with concordant LFC | Immediate validation (individual KO, growth assay) |
| B -- Moderate hit | MAGeCK RRA p < 0.01, BAGEL BF 2-5, >=2 concordant sgRNAs | Secondary validation pool |
| C -- Weak/ambiguous | p > 0.01, BF < 2, or discordant sgRNA effects | Deprioritize; check for copy-number bias or seed effects |
Interpreting screen results:
Synthesis questions to address in the report:
ANALYSIS_DETAILS.md - Detailed code snippets for all 8 phasesUSE_CASES.md - Complete use cases (essentiality screen, synthetic lethality, drug target discovery, expression integration) and best practicesEXAMPLES.md - Example usage and quick referenceQUICK_START.md - Quick start guideFALLBACK_PATCH.md - Fallback patterns for API issues