From mims-harvard-tooluniverse
Analyzes alignments (FASTA/PHYLIP/Nexus) and trees (Newick): computes treeness, RCV, parsimony sites, evolutionary rate, DVMC, tree length, GC content using PhyKIT, Biopython, DendroPy. Builds NJ/UPGMA/parsimony trees, Robinson-Foulds distances, Mann-Whitney U tests.
npx claudepluginhub joshuarweaver/cascade-data-analytics --plugin mims-harvard-tooluniverseThis skill uses the workspace's default tool permissions.
PhyKIT, Biopython, and DendroPy for alignment/tree analysis, evolutionary metrics, and comparative genomics.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
PhyKIT, Biopython, and DendroPy for alignment/tree analysis, evolutionary metrics, and comparative genomics.
When uncertain about any scientific fact, SEARCH databases first.
FASTA/PHYLIP/Nexus/Newick files; treeness, RCV, DVMC, evolutionary rate, parsimony sites, tree length, bootstrap; group comparisons (Mann-Whitney U); tree construction (NJ/UPGMA/parsimony); Robinson-Foulds distance.
BixBench: 33 questions across bix-4, bix-11, bix-12, bix-25, bix-35, bix-38, bix-45, bix-60.
NOT for: MSA generation (MUSCLE/MAFFT), ML trees (IQ-TREE/RAxML), Bayesian (MrBayes/BEAST).
import numpy as np, pandas as pd
from scipy import stats
from Bio import AlignIO, Phylo, SeqIO
from phykit.services.tree.treeness import Treeness
from phykit.services.tree.total_tree_length import TotalTreeLength
from phykit.services.tree.evolutionary_rate import EvolutionaryRate
from phykit.services.tree.dvmc import DVMC
from phykit.services.tree.treeness_over_rcv import TreenessOverRCV
from phykit.services.alignment.parsimony_informative_sites import ParsimonyInformative
from phykit.services.alignment.rcv import RelativeCompositionVariability
import dendropy
ALIGNMENT ANALYSIS (FASTA/PHYLIP):
Parsimony sites → phykit_parsimony_informative()
RCV → phykit_rcv()
Gap % → alignment_gap_percentage()
TREE ANALYSIS (Newick):
Treeness → phykit_treeness()
Tree length → phykit_tree_length()
Evolutionary rate → phykit_evolutionary_rate()
DVMC → phykit_dvmc()
Bootstrap → extract_bootstrap_support()
COMBINED: Treeness/RCV → phykit_treeness_over_rcv(tree, aln)
TREE CONSTRUCTION: NJ → build_nj_tree(); UPGMA → build_upgma_tree(); Parsimony → build_parsimony_tree()
GROUP COMPARISON: batch metrics → Mann-Whitney U → summary stats
TREE COMPARISON: Robinson-Foulds → robinson_foulds_distance()
| Metric | Input | Description |
|---|---|---|
| Treeness | Newick | Internal / total branch length |
| RCV | FASTA/PHYLIP | Relative Composition Variability |
| Treeness/RCV | Both | Signal quality ratio |
| Tree Length | Newick | Sum of all branch lengths |
| Evolutionary Rate | Newick | Total length / num terminals |
| DVMC | Newick | Degree of Violation of Molecular Clock |
| Parsimony Sites | FASTA/PHYLIP | Sites with >=2 chars appearing >=2 times |
fungi_dvmc = batch_dvmc(discover_gene_files("data/fungi"))
animal_dvmc = batch_dvmc(discover_gene_files("data/animals"))
print(f"Fungi median: {np.median(list(fungi_dvmc.values())):.4f}")
u_stat, p_value = stats.mannwhitneyu(list(g1.values()), list(g2.values()), alternative='two-sided')
Filter by gap percentage < 5%, then compute treeness/RCV on filtered set.
gene_files = discover_gene_files("data/") # → [{gene_id, aln_file, tree_file}]
treeness_results = batch_treeness(gene_files) # → {gene_id: value}
| Pattern | Method |
|---|---|
| "median X" | np.median(values) |
| "maximum X" | np.max(values) |
| "difference in median" | abs(np.median(a) - np.median(b)) |
| "Mann-Whitney U" | stats.mannwhitneyu(a, b)[0] |
| "fold-change" | np.median(a) / np.median(b) |
Rounding: PhyKIT default 4 decimals. U stats = integer. Question wording overrides.
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Treeness | >0.8 | 0.5-0.8 | <0.5 |
| RCV | <0.2 | 0.2-0.5 | >0.5 |
| Treeness/RCV | >2.0 | 1.0-2.0 | <1.0 |
| Bootstrap | >95% | 70-95% | <70% |
| Parsimony sites | >30% | 10-30% | <10% |
All files identified; group structure detected; correct PhyKIT function; ALL genes processed (not sample); correct test; 4-decimal rounding; specific statistic (median/max/U/p); Mann-Whitney alternative='two-sided'.
references/sequence_alignment.md, references/tree_building.md, references/parsimony_analysis.md, scripts/tree_statistics.py