Search everything...

Stats

Actions

Available In

eval-1337

Name: eval-1337
Author: yzavyas

By yzavyas

Write rigorous evals for LLM agents, skills, MCP servers, and prompts. Use when: building test suites, measuring effectiveness, choosing frameworks. Covers: DeepEval, Braintrust, RAGAS, precision/recall, F1.

Publisher marketplaceeval-1337@claude-1337 · marketplace and plugin share one repository (yzavyas/claude-1337)

npx claudepluginhub yzavyas/claude-1337 --plugin eval-1337

Popularity

Stars

Top 25%

Med: 0·Avg: 825

Copy clicks

Med: 0·Avg: 2

What's Inside

Skills1

build-eval

/skills/build-eval

Write rigorous evals for LLM agents, multi-agent systems, skills, MCP servers, and prompts. Use when: building test suites, measuring agent effectiveness, evaluating coordination, or choosing eval frameworks. Covers: DeepEval, Braintrust, RAGAS, precision/recall, F1, task completion, pass@k, iterative metrics, multi-agent coordination.

README

claude-1337

A marketplace of cognitive extensions for Claude Code.

📚 Documentation · 🔍 Catalog · 💡 Ethos

Install

/plugin marketplace add yzavyas/claude-1337

/plugin install core-1337@claude-1337

If plugins don't load

Known issues: #14815, #14061, #15369

Workaround:

~/.claude/plugins/marketplaces/claude-1337/plugins/core-1337/scripts/install-workaround.sh

Contributing

Development happens on the dev branch. This main branch is for marketplace distribution only.

git checkout dev

See CONTRIBUTING.md or the contributor guide.

License

MIT

Similar Plugins

agent-eval-harness

34·2·

Agent and skill evaluation harness with MLflow integration

v1.22.0

opendatahub-io

wshobson-llm-evaluation

37.8k·

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

v1.0.0

wshobson

muratcankoylan-evaluation

17.1k·

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.

v1.0.0

muratcankoylan

skill-optimizer

56·

Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

2mo

v2.0.0

fastxyz

More by yzavyas

visuals-1337

7·

AI image and video generation. Use when: Midjourney prompting, choosing image/video models, troubleshooting AI art, reference types, style transfer, text-in-image.

6mo

v0.1.0

yzavyas

rust-1337

7·

Rust production patterns. Use when: building Rust systems. Covers ownership decisions, async gotchas, crate selection, domain knowledge (networking, embedded, WASM, FFI, proc-macros).

6mo

v0.2.1

yzavyas

jvm-analysis-1337

7·

JVM static and runtime analysis. Use when: finding dead code, optimizing Java/Kotlin apps, profiling, debugging memory leaks. Covers SootUp, Scavenger, async-profiler, JFR, ProGuard.

6mo

v0.2.1

yzavyas

kotlin-1337

7·

Elite Kotlin development patterns. Use when: writing Kotlin for backend (Ktor, Spring Boot), Android, Multiplatform. Covers coroutines, structured concurrency, Flow, scope functions, null safety, Java interop, testing (Kotest, MockK), benchmarking (kotlinx-benchmark).

6mo

v0.2.1

yzavyas

arch-guild

7·

Architectural reasoning with The Guild. 13 specialized agents with orthogonal perspectives for multi-viewpoint architecture review.

6mo

v0.1.0

yzavyas

Stats

Version0.2.1

LanguagePython

Stars7

Forks3

MaintenanceExcellent

LicenseMIT

Last CommitJan 18, 2026

AddedJan 24, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

claude-13373

eval-1337

Popularity

What's Inside

README

claude-1337

Install

If plugins don't load

Contributing

License

Similar Plugins

agent-eval-harness

wshobson-llm-evaluation

muratcankoylan-evaluation

skill-optimizer

More by yzavyas

visuals-1337

rust-1337

jvm-analysis-1337

kotlin-1337

arch-guild

Confidence

Popularity

Health & Quality

More by yzavyas

visuals-1337

rust-1337

jvm-analysis-1337

kotlin-1337

arch-guild

Similar Plugins

agent-eval-harness

wshobson-llm-evaluation

muratcankoylan-evaluation

skill-optimizer

evals-skills

evaluate-agent