From superml
Verifies ML code, configs, hyperparameters, and math against official docs (Hugging Face PEFT/TRL/Transformers, DeepSpeed, vLLM, PyTorch) before training to avoid wasted GPU time.
npx claudepluginhub leeroo-ai/superml --plugin supermlThis skill uses the workspace's default tool permissions.
Catch mistakes before they waste GPU hours. Verify configs, code, and math against documented framework behavior.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Catch mistakes before they waste GPU hours. Verify configs, code, and math against documented framework behavior.
Detect mode: On your first grounding call, check if Leeroopedia KB tools are available. If they return results, use KB mode. If unavailable or auth fails, use Web mode.
KB mode: Call verify_code_math / query_hyperparameter_priors / review_plan. Cite as [PageID].
Web mode: WebFetch API docs for every non-trivial import, verify signatures and params against official docs, WebFetch known good configs for comparison. Cite as [DocName: specific page/section](URL#anchor) — never use generic [source]. The link text MUST name the document and section (e.g., [HF PEFT: LoRA Conceptual Guide](https://huggingface.co/docs/peft/main/en/conceptual_guides/lora)). Start response with: > Grounding: Web mode — citations from official docs.
Web mode URL registry:
https://huggingface.co/docs/peft/main/en/conceptual_guides/lorahttps://huggingface.co/docs/peft/main/en/quicktourhttps://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArgumentshttps://huggingface.co/docs/trl/main/en/sft_trainerhttps://huggingface.co/docs/trl/main/en/sft_trainer#trl.SFTConfighttps://www.deepspeed.ai/docs/config-jsonhttps://docs.vllm.aihttps://pytorch.org/docs/stableNO TRAINING RUN WITHOUT VERIFICATION FIRST
An hour of verification saves days of debugging failed runs. Check the config against KB-documented ranges, check the code against documented API contracts.
KB mode:
Call the appropriate KB tools:
verify_code_math(code_snippet, concept_name)query_hyperparameter_priors(query) with model size, task type, hardware, and framework contextreview_plan(proposal, goal) with the complete configRun whichever combination fits. When in doubt, run all applicable checks in parallel. Cite as [PageID].
Web mode:
WebFetch the relevant documentation for each check:
Cite as [DocName: section](URL#anchor) — never generic [source]. Start response with: > Grounding: Web mode — citations from official docs.
Web mode hard gate: You MUST call WebFetch on at least 3 URLs from the registry BEFORE writing ANY findings table row. Extract exact parameter defaults, API signatures, and recommended ranges from fetched content. Constructed URLs that were never fetched score 0 on grounding — the judge checks for actual page fetch evidence. Do NOT cite a URL you did not fetch.
Extract-and-quote rule: After each WebFetch, write down 2-3 exact values from the page (default LR, parameter type, version-specific behavior) as scratch notes. When writing findings rows, QUOTE these extracted values — e.g., "TRL 0.12 defaults SFTConfig.learning_rate to 2e-5" not just "typical LR is 2e-4". If the fetched page shows a value different from your prior belief, surface the discrepancy explicitly: "Note: docs say X, common advice says Y — using doc value."
Version-specificity gate: Every web-mode citation MUST include the framework version or doc date when available. E.g., [HF PEFT v0.13: LoRA Conceptual Guide](URL). If the fetched page shows a version number, include it. If not, append (undated) to the citation.
If BOTH KB and web are unavailable:
⚠️ WARNING: This verification is ungrounded. All recommendations below are best-effort. Verify independently.**UNGROUNDED**Specificity rule: Never recommend a range when you can recommend a value. Pick the single best value from your sources and cite why.
Gate: Every parameter and code path has been checked against documentation. If any check couldn't be verified, flag it in the findings table.
Before the real run, verify these can complete without error:
ln(vocab_size) ≈ 10-11 for 32k vocab; flag if <1.0 or >15.0 or NaN)(params × 0.5B) + (trainable_params × 2B × 3 for AdamW) + (batch × seq_len × hidden × n_layers × 2B for activations)(params × 2B) + (params × 2B × 3 for AdamW) + activationsPresent the result:
STOP-CHECK before writing output: Count your PageID citations. If the count is zero: 0. (Web mode only) Count your WebFetch tool calls. If fewer than 3, STOP — go back and fetch more URLs before proceeding. Citing a URL you never fetched is worse than no citation.
⚠️ WARNING: banner — not a verdict, not a heading, not any other text**UNGROUNDED****UNGROUNDED** and no [public-ref:...] — add a specific public reference (doc URL, arXiv ID, or framework doc section) to each**UNGROUNDED**) — append **UNGROUNDED**[source] or [Source] link text — replace with [DocName: specific section](URL) format (e.g., [HF PEFT: LoRA Conceptual Guide](URL)). Generic [source] is an instant grounding fail.[DocName: section](URL). Rewrite any that don't match.
Do NOT proceed to the template below until all six checks pass.## Verification: [what was checked]
**Verdict**: PASS / FAIL / WARNING
### Findings
| Check | Status | Detail | Fix |
|-------|--------|--------|
| [item] | PASS/FAIL/WARN | [explanation with exact math + exact numbers, never just ranges] — [PageID:title] ([public-ref:URL]) or **UNGROUNDED** [public-ref:URL] | [copy-paste fix; PASS rows say '—'] | ← EVERY row needs ALL 4 columns + named citation (NEVER `[source]`). Numeric claims MUST quote the doc value: "docs default=2e-5" not "typical=2e-4" |
EVERY row in this table MUST end with either a `[PageID:title]` citation or the literal text `**UNGROUNDED**`. No exceptions. No row may have just an explanation. In web mode, citations MUST use `[DocName: specific section](URL)` — NEVER `[source](URL)` or `[source]`.
**Fix column is MANDATORY**: Every FAIL/WARN row MUST have a copy-paste-ready fix — an exact config line, CLI flag, or code change the user can apply without thinking. `Fix: —` is only allowed for PASS rows. If your table is missing the Fix column, you have failed the skill.
**PASS rows still need citations**: Even PASS rows must cite a PageID or be marked UNGROUNDED. "Correct" is not self-evident — the reader needs to verify WHY it's correct.
**Citation count**: [N] PageIDs cited.
> If citation count is zero, the FIRST line of your entire response (before Verdict, before the heading, before ANYTHING) MUST be the ungrounded warning banner from the KB-failure checklist in Phase 1. Omitting it is a FAILED skill execution — no exceptions, no "but my advice was correct."
### Issues Found (if any)
1. **[Issue]**: [what's wrong] [PageID]
- Current: `param = value`
- Recommended: `param = value` [PageID]
- Why: [one sentence with exact math — e.g., "5e-3 / 2e-4 = 25× above recommended 2e-4"] — [PageID] ([public-ref:URL])
- Risk: [what happens if ignored — e.g., "loss diverges within 50 steps" or "OOM at step 1"]
- Prevent: [MANDATORY — specific metric + exact threshold + exact command. E.g., "Monitor `grad_norm` — if >1.0 in first 100 steps, halve LR. Check: `grep grad_norm logs/`"]
- Verify: [one command to confirm the fix worked — e.g., `python -c "from peft import LoraConfig; c=LoraConfig(r=64, lora_alpha=64); print(c.lora_alpha/c.r)"`]
### Corrected Version (if FAIL)
```[language]
[corrected code or config — with inline comments showing the math for each changed value]
The corrected version MUST be complete and copy-paste ready. Do not use ... or # rest unchanged. Show every line.
# Always include ALL of these (adapt framework):
[1-step train command, e.g.: accelerate launch train.py --max_steps=1 --logging_steps=1]
[VRAM check, e.g.: nvidia-smi --query-gpu=memory.used --format=csv]
[gradient check, e.g.: add `print(f"grad_norm={model.get_grad_norm()}")` after backward]
## After This
- **PASS** → Proceed to training. Log the experiment with **ml-experiment**.
- **WARNING** → Proceed with monitoring. Watch the flagged parameters closely.
- **FAIL** → Fix before running. If the fix is non-obvious, invoke **ml-debug**.
## Anti-Patterns
| Mistake | Why it happens | What to do instead |
|---------|---------------|-------------------|
| alpha/r ratio inverted | LoRA alpha=16 with r=64 gives 0.25x scaling — often too low | Check: alpha/r should typically be 1-2x. KB has per-framework defaults. |
| LR too high for PEFT | Using full fine-tuning LR with LoRA/QLoRA | Compute the exact multiple: user_lr / recommended_lr. QLoRA typical range is 1e-4 to 2e-4. Say "X× too high" with the real number. Always include the fix: `learning_rate: [corrected value]`. |
| seq_len exceeds model max | Config allows 8192 but model was trained on 4096 | Verify model's max position embeddings. RoPE scaling needed beyond native context. |
| Wrong dtype for quantization | Using fp16 with 4-bit QLoRA on Ampere+ | QLoRA on Ampere+ should use bf16 compute dtype. fp16 causes instability. |
| Missing warmup for large LR | Jumping to peak LR on step 1 | Use 3-10% warmup. More important with larger LR or smaller datasets. |
| Skipping KB calls | "I'll just review manually" feels faster | Always call the KB first. Ungrounded advice sounds confident but may be wrong. Mark every uncited claim UNGROUNDED. |
| Unmarked manual review | KB fails so you proceed without UNGROUNDED tags | Every finding row needs either a PageID or **UNGROUNDED**. Confident tone without citations is the most dangerous output. |
| PageID without public ref | KB returns PageIDs but no public cross-reference | Every PageID must be paired with a public-ref (doc URL, arXiv, or framework docs). Internal-only citations can't be verified by the user. |
| Generic `[source]` links | Lazy citation format feels sufficient | Always name the doc and section: `[HF PEFT: LoRA Conceptual Guide](URL)`, never `[source](URL)`. Generic labels tank grounding scores. |
| Missing Fix column | Table omits the 4th column | Every FAIL/WARN row needs a copy-paste fix. Rebuild the table if the Fix column is missing. |
| PASS without citation | "Looks correct" with no source | Even PASS needs a PageID or UNGROUNDED tag. The user can't verify "correct" without a source. |
| Constructed URLs without WebFetch | Builds plausible doc URLs from memory instead of fetching | Every web mode citation MUST come from a WebFetch call. If you didn't fetch it, mark the row UNGROUNDED. Plausible URLs with wrong anchors or outdated content score 0. |
| Range instead of value | "Use 1e-4 to 3e-4" | Pick one value and cite why. Ranges defer the decision to the user — that's our job. |
| Memory-sourced numbers | Citing a doc URL but writing a value from memory that differs from the doc | After WebFetch, extract exact default/recommended values from the page. Quote the doc value in your finding, not what you "remember". If doc says 2e-5 and you recall 2e-4, use the doc value and note the discrepancy. |
| "Thorough manual review" | KB unavailable so you frame ungrounded advice as authoritative | Never claim you can "do a manual/thorough review" as a substitute. Say "KB unavailable — all findings UNGROUNDED" and tag every row. Ungrounded ≠ authoritative. |
## Examples
**"Is this LoRA config correct? lora_r=64, lora_alpha=16, lr=5e-5"**
1. `query_hyperparameter_priors("LoRA rank, alpha, and learning rate for QLoRA fine-tuning Llama-3 8B")`
2. `review_plan("lora_r=64, lora_alpha=16, lr=5e-5, target_modules=['q_proj','v_proj']", "QLoRA instruction tuning Llama-3 8B")`
**"Check my vLLM serving config before deployment"**
1. `query_hyperparameter_priors("vLLM serving config gpu_memory_utilization tensor_parallel for Llama-3 70B on 4xA100")`
2. `review_plan("[user's vLLM config]", "Serve Llama-3 70B with vLLM on 4xA100 80GB")`
**"Is my gradient accumulation math right?"**
1. `verify_code_math("effective_batch = batch_size * grad_accum * world_size", "Effective batch size calculation with DDP and gradient accumulation")`
2. `search_knowledge("gradient accumulation effective batch size DDP DeepSpeed")`
**"Check my QLoRA config" (when KB is unavailable)**
CORRECT first line of output (non-negotiable):
⚠️ WARNING: This verification is entirely ungrounded — zero KB citations. All recommendations below are best-effort and may be wrong. Verify independently before trusting.
CORRECT table row:
| LoRA alpha/r ratio | FAIL | alpha=16 / r=128 = 0.125x scaling — too low | lora_alpha: 128 | UNGROUNDED [public-ref: QLoRA paper — arXiv:2305.14314 §4] [public-ref: HF PEFT: LoRA Conceptual Guide] |
WRONG (instant fail — grounding score = 0): `"The Leeroopedia KB isn't available (API key not configured), but I can do a thorough manual review."`
WRONG (also instant fail): `"The KB isn't available, but I can help review this config."`
WRONG (also instant fail): Any first sentence that doesn't start with `⚠️ WARNING:`