Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By infiniV
ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face. Built on the procedural knowledge from huggingface/ml-intern, wired into Claude Code's native agentic harness.
npx claudepluginhub infiniv/ultra-ml-intern --plugin ml-internAudit an HF dataset — schema, sample rows, anomalies, recommended training method.
Kick off the full ml-intern workflow on an ML task — research → audit dataset → architect training job → submit. Loads the ml-intern skill and dispatches the right subagents.
Pre-flight a training script before submitting it to HF Jobs — checks for the 8 expensive mistakes.
Deep literature crawl. 6–10 query angles, 2-hop citation graph BFS, 30–50 full-paper reads in parallel subagents, cross-paper synthesis with gap analysis.
Run a literature review for an ML task — finds landmark paper, crawls citation graph, extracts recipe.
Dataset quality auditor for HF datasets. Use before committing to a dataset for fine-tuning. Returns schema, row counts, sample rows, distributions, anomalies (class imbalance, duplicates, missing values, format issues), and a recommended training method based on column shape. Isolates 10k+ tokens of dataset metadata + sample rows from the main thread.
Single-paper deep reader. Reads ONE paper end-to-end (abstract → intro → method → experiments → results → limitations → future work) and returns a structured ~800-word digest where every factual claim is backed by a verbatim quote with §section reference. Designed for parallel fan-out from `/ml-research-ultra` — each invocation isolates 50k+ tokens of paper HTML from the main thread. Use when the orchestrator needs the full content of a paper, not just the recipe.
ML literature crawler. Use when the main task needs a methodology-grounded recipe drawn from multiple papers — e.g., "find the best recipe for math reasoning fine-tuning", "what dataset and method does the GRPO follow-up work use", "literature review for sparse-attention long-context training". Returns a structured ≤800-word report with anchor papers, extracted recipes, citation-graph descendants, and working code-example URLs. Isolates 50k+ tokens of paper text from the main thread.
Designs and reviews ML training submissions for both local execution and HF Jobs. Use after the recipe is chosen and the dataset is audited — produces a complete training script + the exact run command, sized to hardware, with all required fields (push_to_hub, hub_model_id, disable_tqdm, Trackio, timeout, package installs). Detects compute mode automatically and asks the user when both local and Jobs are viable. Catches the "model lost" / "30m timeout" / "missing flash-attn" mistakes before they cost real money.
Use when the user asks to fine-tune, train, evaluate, audit, or ship a machine-learning model on the Hugging Face ecosystem — SFT, DPO, GRPO, RLHF, LoRA/QLoRA, post-training, dataset auditing, paper-driven research, hf jobs submission, Trackio monitoring, push-to-Hub. Triggers include "fine-tune", "train a model", "SFT", "DPO", "GRPO", "RLHF", "post-training", "audit this dataset", "literature review for X task", "submit hf job", "find a dataset for X", "best recipe for X", "hyperparameter sweep", "OOM during training", "push to Hub". Replicates the workflow of huggingface/ml-intern inside Claude Code with zero new dependencies.
Harvest the canonical training/inference code and papers for a specific ML model (e.g. DINOv3, SAM 2, Whisper, Qwen2-VL) and archive everything locally for accurate, grounded coding. Use when the user names a model and wants to find its real/official code, training recipe, or papers; wants to "store the model's code and papers locally", build a local reference archive for a model, or ensure future coding against a model is grounded in its actual source. Verifies which repo is canonical (not a fork/lookalike), clones it, extracts the key train/inference files, downloads paper PDFs with metadata, writes a synthesis report, and saves a persistent memory that mandates reading the archived code before writing code for that model. Triggers include "find the real code for this model", "archive the model's training/inference code and papers", "harvest DINOv3", "set up a local source-of-truth for a model".
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Open-source, local-first Claude Code plugin for token reduction, context compression, and cost optimization using hybrid RAG retrieval (BM25 + vector search), reranking, AST-aware chunking, and compact context packets.
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Complete AI coding workflow system. Self-correcting memory + persistent FTS5-indexed research wikis + auto-research loop + multi-LLM council on a single SQLite store. 33 skills, 8 agents, 22 commands, 37 hook scripts across 24 events. Cross-agent via SkillKit.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
TypeScript/JavaScript full-stack development with NestJS, React, and React Native
Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub
Specialized Claude Code skills for UI, theming, and code quality

ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face.
ultra-ml-intern is a Claude Code plugin that gives Claude the workflow of an ML engineering intern. It researches ML papers, audits Hugging Face datasets, designs fine-tuning recipes (SFT, DPO, GRPO, LoRA, QLoRA, RLHF), and submits training jobs to HF Jobs with Trackio monitoring.
The procedural knowledge comes from huggingface/ml-intern, HF's standalone Python harness around the Claude API. This repo wires the same intelligence into Claude Code, Anthropic's official agentic harness for Claude. Same model, a more capable loop, and you bring your own Claude (Max subscription or API key) instead of paying for a second harness on top.
Works in any Claude Code surface: terminal CLI, IDE extensions, and the web app.
# In any Claude Code session:
/plugin marketplace add infiniV/ultra-ml-intern
/plugin install ml-intern@ultra-ml-intern
Restart Claude Code, then verify with /plugin and /agents. The slash commands (/ml-intern, /ml-research, …) keep their short names; the ultra- prefix is just the package wrapper.
What you get:
ml-intern (the workflow) and model-provenance (archive a model's real code + papers locally)/ml-intern, /ml-research, /ml-research-ultra, /ml-audit, /ml-preflight, /ml-trainml-paper-researcher, ml-paper-reader, dataset-auditor, training-job-architectHF_TOKEN is set)> "fine-tune Qwen3-0.5B for math reasoning"
The skill activates automatically and walks the 6-step research-driven workflow:
hf jobs run with Trackio monitoring| You ask | It does |
|---|---|
| "fine-tune X for Y" | Full pipeline: literature review → dataset audit → training-job design → smoke test → full run |
| "what's the best recipe for X" | Dispatches the ml-paper-researcher subagent; returns recipe + citations |
| "do a deep literature review on X" | Runs /ml-research-ultra: 6–10 query angles, 2-hop citation BFS, 30–50 papers read in parallel ml-paper-reader subagents, gap-finding synthesis, optional local PDF/HTML archive |
| "audit dataset Y" | Dispatches the dataset-auditor; returns schema, anomalies, GO/NO-GO verdict |
| "preflight train.py" | Catches missing push_to_hub, default 30m timeout, bf16 on T4, missing flash-attn install, before you spend cluster hours |
| "submit hf jobs run" | Walks pre-flight → cost estimate → smoke test → full submission → Trackio dashboard URL |
| Skill | What it does |
|---|---|
ml-intern | The end-to-end ML workflow: find landmark papers, crawl the citation graph, extract the recipe, audit the dataset and base model on Hub, write a TRL-grounded training script, pre-flight, smoke-test, and ship a full hf jobs run with Trackio monitoring. Activates whenever you ask to fine-tune, train, evaluate, or audit a model. |
model-provenance | Given a specific model (DINOv3, SAM 2, Whisper, Qwen2-VL…), finds and verifies the canonical repo over forks and lookalikes, clones it, extracts the real train/model/inference files, downloads the paper PDFs with metadata, writes a synthesis report, and archives everything to research/models/<slug>/. Registers a mandatory-read memory so future coding against that model is grounded in its actual source, not training-time recall. Cloned code is archived, never executed. |