From model-evaluator
Compares multiple ML models on a shared dataset, computing accuracy, latency, memory, and cost metrics, then generates a ranked recommendation report.
How this command is triggered — by the user, by Claude, or both
Slash command
/model-evaluator:compare-modelsThe summary Claude sees in its command listing — used to decide when to auto-load this command
# /compare-models - Compare ML Models Compare multiple ML models to select the best performer. ## Steps 1. Ask the user for the models to compare and the evaluation dataset 2. Load all models and verify they accept the same input format 3. Run inference with each model on the identical test dataset 4. Calculate the same metrics for all models for fair comparison 5. Create a side-by-side comparison table with all metrics 6. Perform statistical significance testing between model pairs (McNemar, paired t-test) 7. Compare inference performance: latency, throughput, memory footprint 8. Calcul...
Compare multiple ML models to select the best performer.
2plugins reuse this command
First indexed Mar 30, 2026
npx claudepluginhub rohitg00/awesome-claude-code-toolkit --plugin model-evaluator/benchmarkRuns a FactorMiner benchmark in a specified mode — table1, suite, ablation, or cost pressure — against a validated dataset and folds results into a research note.
/compareCompares multiple ML experiment runs side-by-side using experiment records, building comparison tables, analyzing parameter sensitivity, generating visualizations, and identifying the best configuration.
/eval-modelRuns rigorous model evaluation: cross-validated metrics, confusion matrix, feature importance, and subgroup bias audit. Produces a draft report for data scientist review.
/explain-modelAnalyzes context to generate AI/ML task code with validation, error handling, performance metrics, insights, artifacts, and documentation.