Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By rohitg00
Evaluate single ML models or compare multiple ones on test datasets across classification, regression, NLP, and generative tasks. Compute metrics, statistical significance, inference performance, costs, robustness, bias checks; generate visualized reports with confusion matrices, performance profiles, tables, rankings, and recommendations.
npx claudepluginhub rohitg00/awesome-claude-code-toolkit --plugin model-evaluatorShare bugs, ideas, or general feedback.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Comprehensive model evaluation with multiple metrics
ML experiment tracking with metrics logging and run comparison
ML/perf investigation skills: topic, plan, judge, run, sweep
ML engineering plugin: Give your AI coding agent ML engineering superpowers.
Skills for tracing, evaluating, and improving AI agents with MLflow. Supports the full agent improvement loop: instrument → trace → evaluate → iterate → validate.
LLM observability tooling for agent development and Claude Code
Persistent memory for AI coding agents -- captures tool usage, compresses via LLM, injects context into future sessions. 12 hooks, 41 MCP tools, 4 skills, real-time viewer.
Complete AI coding workflow system. Self-correcting memory + persistent FTS5-indexed research wikis + auto-research loop + multi-LLM council on a single SQLite store. 33 skills, 8 agents, 22 commands, 37 hook scripts across 24 events. Cross-agent via SkillKit.
Complete developer toolkit for Claude Code
Find and remove dead code across the codebase
Generate comprehensive unit tests for any function or module
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim