Community Plugin

evaluation

LLM benchmarking and evaluation. Includes lm-evaluation-harness (60+ benchmarks like MMLU, HumanEval, GSM8K), BigCode Evaluation Harness (code models), and NeMo Evaluator (enterprise SDK). Use when benchmarking models or measuring performance on standard tasks.

1.0.0

Updated 13 hours ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

LSP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install evaluation@ai-research-skills

Component Details

No components detected in this plugin's metadata.

Stats

Stars746

Forks59

MaintenanceExcellent

Last Commit13 hours ago

Collections

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

cache-components

137.2k

Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents enabled.

v1.0.0

explanatory-output-style

57.0k

Adds educational insights about implementation choices and codebase patterns (mimics the deprecated Explanatory output style)

2mo

v1.0.0

hookify

57.0k

Easily create hooks to prevent unwanted behaviors by analyzing conversation patterns

1mo

v0.1.0

frontend-design

57.0k

109

Frontend design skill for UI/UX implementation

2mo

v1.0.0