Help us improve
Share bugs, ideas, or general feedback.
From agents-meet-rl
Troubleshoots LLM agent RL training: reward stagnation, KL/entropy blow-ups, eval flat, tool-call failures, credit assignment, benchmark contamination. Routes symptoms to cited fixes from a curated corpus.
npx claudepluginhub thinkwee/claude-plugins --plugin agents-meet-rlHow this skill is triggered — by the user, by Claude, or both
Slash command
/agents-meet-rl:agents-meet-rlThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A corpus-anchored handbook for diagnosis and selection. It supplies
database.jsonevals/evals.jsonproblems/_INDEX.mdproblems/evaluation/benchmark-pitfalls.mdproblems/evaluation/canary-eval.mdproblems/evaluation/data-contamination.mdproblems/evaluation/inference-time-scaling.mdproblems/evaluation/llm-judge-evaluation.mdproblems/evaluation/long-horizon-eval.mdproblems/evaluation/multi-agent-eval.mdproblems/evaluation/ood-evaluation.mdproblems/evaluation/pass-at-k.mdproblems/evaluation/reproducibility.mdproblems/evaluation/statistical-significance.mdproblems/evaluation/train-eval-mismatch.mdproblems/research-workflow/ablation-design.mdproblems/research-workflow/baseline-selection.mdproblems/research-workflow/compute-budget.mdproblems/research-workflow/contribution-framing.mdproblems/research-workflow/data-curation.mdActivates senior ML engineer mode with Leeroopedia KB (27k+ pages on vLLM, SGLang, DeepSpeed, Axolotl) enforcing lookups, citations, and grounding before code in ML/AI discussions.
Provides production-ready patterns for building LLM applications: RAG pipelines, document chunking, embedding models, vector database selection, and agent architectures.
Evaluates and improves GenAI agent output quality using MLflow's native APIs for datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components.
Share bugs, ideas, or general feedback.
A corpus-anchored handbook for diagnosis and selection. It supplies knowledge — it does not read or run your training: it can't inspect your logs, wandb, or live metrics. You bring the symptom; it returns likely causes, checks, and cited fixes for you to apply.
problems/_INDEX.md — symptom → file routing, grouped under
training/, evaluation/, research-workflow/. Start here.problems/<cat>/<file>.md — per-symptom files. Most follow
Symptoms → Root causes → Diagnosis → Fixes → References; knob /
decision / modality / eval-checklist / research-workflow files use
task-oriented structures.references/_INDEX.md + references/<cat>.md — per-category
project lists with full metadata. Each entry carries an Idea:
line — one sentence on its distinctive contribution, grounded in the
paper/repo. Use for "which framework / benchmark" selection, to look
up project names not routed via problems/_INDEX.md, and to answer
"what's the idea behind X" by quoting its Idea: line.database.json — machine-readable, 312 entries (each with a
takeaway field mirroring the Idea: line) plus 3 paper-only
algorithms (DAPO, Dr.GRPO, VAPO) whitelisted in
scripts/lint_skill.py.Name the algorithm or idea, then anchor with whatever canonical URLs exist for that entry — typically github + arxiv + org + date, but paper-only algorithms (in the whitelist) get just the paper URL, and tools / environments without papers get just github + org + date.
Examples:
Project with paper (typical): Adapt Search-R1's outcome-only reward — code · paper · UIUC/Google · 2025.3.
Paper-only algorithm (whitelist): Try DAPO's clip-higher — paper · ByteDance Seed · 2025.3.
Tool / environment without paper: Run rollouts in atropos — code · Nous Research · 2025.4.
Cite at the idea level, not paper sections or file paths inside repos — they rot. If an entry isn't in the corpus, say so; don't fabricate.
If two corpus entries share a name (e.g. ARPO appears as both a
reasoning RL method and a GUI-agent training method), disambiguate by
including the org and paper URL — they are different works.
Snapshot date: 2026-05-23. If the user mentions a project or paper released after that, flag explicitly that this skill's corpus may not cover it.