From chrisvoncsefalvay-funsloth
Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin chrisvoncsefalvay-funslothThis skill uses the workspace's default tool permissions.
Run Unsloth training on your local GPU.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Run Unsloth training on your local GPU.
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
If CUDA not available:
nvidia-sminvcc --versionpip install torch --index-url https://download.pytorch.org/whl/cu121See references/HARDWARE_GUIDE.md for requirements:
| VRAM | Recommended Setup |
|---|---|
| 8GB | 7B, 4-bit, batch=1, LoRA r=8 |
| 12GB | 7B, 4-bit, batch=2, LoRA r=16 |
| 16GB | 7-13B, 4-bit, batch=2, LoRA r=16-32 |
| 24GB | 7-14B, 4-bit, batch=4, LoRA r=32 |
pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):
docker run -d \
-e JUPYTER_PASSWORD="unsloth" \
-p 8888:8888 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
Access Jupyter at http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.
Environment variables:
JUPYTER_PASSWORD - Jupyter auth (default: unsloth)JUPYTER_PORT - Port (default: 8888)USER_PASSWORD - User/sudo password (default: unsloth)jupyter notebook notebooks/sft_template.ipynb
# Edit configuration in script, then run
python scripts/train_sft.py
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use first GPU
# Watch GPU usage
watch -n 1 nvidia-smi
# Or use nvitop (more detailed)
pip install nvitop && nvitop
export WANDB_API_KEY="your-key"
# Add report_to="wandb" in TrainingArguments
Try in order:
torch.cuda.empty_cache()packing=True for short sequencesSee references/TROUBLESHOOTING.md for more solutions.
TrainingArguments(
resume_from_checkpoint=True, # Auto-find latest
# Or: resume_from_checkpoint="outputs/checkpoint-500"
)
Training script automatically saves:
outputs/lora_adapter/ - LoRA weightsoutputs/merged_16bit/ - Merged model (optional)from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Offer funsloth-upload for Hub upload with model card.
save_steps