Community Plugin

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

1.0.0

Updated 12 hours ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

LSP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install zechenzhangagi-nemo-evaluator-sdk-11-evaluation-nemo-evaluator@zechenzhangAGI/AI-research-SKILLs

Component Details

No components detected in this plugin's metadata.

Stats

Stars687

Forks54

MaintenanceGood

Last Commit12 hours ago

Collections

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

code-review

56.6k

Automated code review for pull requests using multiple specialized agents with confidence-based scoring

v1.0.0

learning-output-style

56.6k

Interactive learning mode that requests meaningful code contributions at decision points (mimics the unshipped Learning output style)

2mo

v1.0.0

pr-review-toolkit

56.6k

Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification

3mo

v1.0.0

feature-dev

56.6k

Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review

2mo

v1.0.0