Compares multiple ML models on a shared test dataset, evaluating metrics, statistical significance, inference performance, costs, robustness, and generates a report with tables, rankings, and recommendations.
From model-evaluatornpx claudepluginhub rohitg00/awesome-claude-code-toolkit --plugin model-evaluatorCompare multiple ML models to select the best performer.