Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install zechenzhangagi-evaluating-code-models-11-evaluation-bigcode-evaluation-harness@zechenzhangAGI/AI-research-SKILLsClaude Agent SDK Development Plugin
Implementation of the Ralph Wiggum technique - continuous self-referential AI loops for interactive iterative development. Run Claude in a while-true loop with the same prompt until task completion.
Comprehensive toolkit for developing Claude Code plugins. Includes 7 expert skills covering hooks, MCP integration, commands, agents, and best practices. AI-assisted plugin creation and validation.
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification