Help us improve
Share bugs, ideas, or general feedback.
From model-evaluator
Compares multiple ML models on a shared test dataset, evaluating metrics, statistical significance, inference performance, costs, robustness, and generates a report with tables, rankings, and recommendations.
npx claudepluginhub rohitg00/awesome-claude-code-toolkit --plugin model-evaluatorHow this command is triggered — by the user, by Claude, or both
Slash command
/model-evaluator:compare-modelsThe summary Claude sees in its command listing — used to decide when to auto-load this command
# /compare-models - Compare ML Models Compare multiple ML models to select the best performer. ## Steps 1. Ask the user for the models to compare and the evaluation dataset 2. Load all models and verify they accept the same input format 3. Run inference with each model on the identical test dataset 4. Calculate the same metrics for all models for fair comparison 5. Create a side-by-side comparison table with all metrics 6. Perform statistical significance testing between model pairs (McNemar, paired t-test) 7. Compare inference performance: latency, throughput, memory footprint 8. Calcul...
/llm-compareCompares a prompt across OpenAI Codex, Google Gemini, and Ollama LLMs, selecting models/context, verifying availability, and appending test instructions for review fixes.
/explain-modelAnalyzes context to generate AI/ML task code with validation, error handling, performance metrics, insights, artifacts, and documentation.
/compareCompares multiple ML experiment runs side-by-side from tracking store, analyzes parameter sensitivity, generates visualizations, identifies best configuration, and recommends next experiments.
/data-scientistAdopts data-scientist persona to develop ML models, perform data analysis, and validate statistics based on the provided request.
Share bugs, ideas, or general feedback.
Compare multiple ML models to select the best performer.