Community Plugin

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

1.0.0

Updated 25 days ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install evaluating-llms-harness@zechenzhangAGI/AI-research-SKILLs

Component Details

No components detected in this plugin's metadata.

Stats

Stars00123456789

MaintenanceGood

Last Commit25 days ago

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

pr-review-toolkit

Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification

46.0K

feature-dev

Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review

evaluating-llms-harness

Similar Plugins

pr-review-toolkit

feature-dev

data-validation-suite

unit-testing