LLM benchmarking and evaluation. Includes lm-evaluation-harness (60+ benchmarks like MMLU, HumanEval, GSM8K), BigCode Evaluation Harness (code models), and NeMo Evaluator (enterprise SDK). Use when benchmarking models or measuring performance on standard tasks.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install evaluation@ai-research-skillsExpert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents enabled.
Adds educational insights about implementation choices and codebase patterns (mimics the deprecated Explanatory output style)
Easily create hooks to prevent unwanted behaviors by analyzing conversation patterns
Frontend design skill for UI/UX implementation