Autotune
A Claude Code plugin for fine-tuning LLMs — from dataset inspection through GGUF export. It gives Claude the tools and domain knowledge to be a real copilot for your training loop, not just a chat assistant that guesses shell commands.
Prefers Unsloth when available for faster training and lower VRAM usage. Falls back to Hugging Face + TRL when it is not.
Setup
git clone <repo-url> autotune
cd autotune
uv sync
claude .
That is it. uv sync creates .venv/ which the bundled .mcp.json uses to launch the MCP server. Claude Code picks up the server, commands, and skills automatically when you open the repo root.
To confirm everything is working, ask Claude in your session:
check GPU availability
It should call the check_gpu tool and report your CUDA status and VRAM.
The Workflow
There are two paths depending on your data.
SFT — instruction, chat, or text data
init-project → dataset-audit → baseline → plan-experiments
→ run-experiment → compare-runs → ship-decision → merge → export-gguf
DPO/ORPO — preference data (prompt / chosen / rejected)
init-project → dataset-audit → baseline → plan-experiments
→ run-dpo → compare-runs → ship-decision → merge → export-gguf
plan-experiments auto-detects which path your dataset belongs to and proposes the right run configs. You approve the plan before any GPU is touched.
For step-by-step walkthroughs, see:
Slash Commands
| Command | What it does |
|---|
/init-project | Create project context files, reports, and results directories |
/dataset-audit | Inspect a dataset's schema, infer training format, flag quality risks |
/baseline | Evaluate the base model before any fine-tuning |
/plan-experiments | Build an approval-ready run sequence with concrete hyperparameters |
/run-experiment | Execute one approved SFT run and evaluate it |
/run-dpo | Execute one approved DPO or ORPO run and evaluate it |
/compare-runs | Rank completed runs by saved metrics |
/debug-run | Diagnose a failed or weak run from logs |
/ship-decision | Decide whether the best run is ready for review |
/merge | Merge a LoRA adapter into the base model weights |
/export-gguf | Export a model to GGUF for llama.cpp, Ollama, or LM Studio |
/train | Legacy shortcut for a single SFT run |
/eval | Legacy shortcut for running evaluation |
/experiment | Legacy entrypoint — now behaves as a guided workflow |
Capabilities
Dataset inspection
Autotune inspects format (instruction, chat, completion, preference), row count, column schema, and obvious quality risks before any training starts. Supports Hugging Face datasets by name and local files: .json, .jsonl, .csv, .parquet, .txt.
Experiment planning
The planner is VRAM-aware. It reads your GPU's available memory, estimates peak usage for each proposed config, and caps batch sizes so you do not OOM. It also adapts to dataset size — fewer steps for small datasets, more for large ones — and skips conservative runs when your baseline is already strong.
Training
LoRA and QLoRA SFT via SFTTrainer. DPO and ORPO preference optimization via DPOTrainer / ORPOTrainer. All runs save a checkpoint every 50 steps, so you can resume if something interrupts. Pass report_to=wandb or report_to=mlflow to stream metrics to your tracker.
Evaluation
MMLU benchmark out of the box. Pass your own dataset with eval_dataset to compute cross-entropy loss and perplexity on held-out examples.
Run management
compare-runs ranks everything by eval metric, then by training loss. debug-run reads the logs and tells you the most likely failure mode — OOM, loss divergence, dataset formatting issues, or chat template mismatch.
Deployment
Merge the LoRA adapter back into the base weights for a standalone model. Then export to GGUF (q4_k_m, q5_k_m, q8_0, or f16) for local inference with llama.cpp, Ollama, or LM Studio. Uses Unsloth's native GGUF path when available; falls back to a Hugging Face merge with llama.cpp conversion instructions.
Gradio chat
Ask Claude to launch a chat interface for any model or adapter. Useful for a quick sanity check before you start comparing benchmarks.
Requirements
- Python 3.11+
- Claude Code
uv (for uv sync; not required after .venv/ exists)
- NVIDIA GPU for training and evaluation
- Enough VRAM for the model you pick — see
examples/sft-to-gguf.md for typical requirements