This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
/plugin marketplace add huggingface/skills/plugin install model-trainer@huggingface-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/gguf_conversion.mdreferences/hardware_guide.mdreferences/hub_saving.mdreferences/reliability_principles.mdreferences/trackio_guide.mdreferences/training_methods.mdreferences/training_patterns.mdreferences/troubleshooting.mdscripts/convert_to_gguf.pyscripts/dataset_inspector.pyscripts/estimate_cost.pyscripts/train_dpo_example.pyscripts/train_grpo_example.pyscripts/train_sft_example.pyTrain language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
TRL provides multiple training methods:
For detailed TRL method documentation:
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
# etc.
See also: references/training_methods.md for method overviews and selection guidance
Use this skill when users want to:
When assisting with training jobs:
ALWAYS use hf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}), NOT bash trl-jobs commands. The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using hf_jobs().
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in scripts/ as templates.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Use example scripts as templates - Reference scripts/train_sft_example.py, scripts/train_dpo_example.py, etc. as starting points.
To run scripts locally (like estimate_cost.py), install dependencies:
pip install -r requirements.txt
Before starting any training job, verify:
hf_whoami()secrets={"HF_TOKEN": "$HF_TOKEN"} in job config to make token available (the $HF_TOKEN syntax
references your actual token value)datasets.load_dataset()push_to_hub=True, hub_model_id="username/model-name"; Job: secrets={"HF_TOKEN": "$HF_TOKEN"}⚠️ IMPORTANT: Training jobs run asynchronously and can take hours
When user requests training:
scripts/train_sft_example.py as template)hf_jobs() MCP tool with script content inline - don't save to file unless user requestsProvide to user:
Example Response:
✅ Job submitted successfully!
Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz
Expected time: ~2 hours
Estimated cost: ~$10
The job is running in the background. Ask me to check status/logs when ready!
💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory. You'll still see training loss and learning progress.
TRL config classes use max_length (not max_seq_length) to control tokenized sequence length:
# ✅ CORRECT - If you need to set sequence length
SFTConfig(max_length=512) # Truncate sequences to 512 tokens
DPOConfig(max_length=2048) # Longer context (2048 tokens)
# ❌ WRONG - This parameter doesn't exist
SFTConfig(max_seq_length=512) # TypeError!
Default behavior: max_length=1024 (truncates from right). This works well for most training.
When to override:
max_length=2048)max_length=512)max_length=None (prevents cutting image tokens)Usually you don't need to set this parameter at all - the examples below use the sensible default.
UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")
# Create train/eval split for monitoring
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset_split["train"],
eval_dataset=dataset_split["test"],
peft_config=LoraConfig(r=16, lora_alpha=32),
args=SFTConfig(
output_dir="my-model",
push_to_hub=True,
hub_model_id="username/my-model",
num_train_epochs=3,
eval_strategy="steps",
eval_steps=50,
report_to="trackio",
project="meaningful_prject_name", # project name for the training name (trackio)
run_name="meaningful_run_name", # descriptive name for the specific training run (trackio)
)
)
trainer.train()
trainer.push_to_hub()
""",
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Benefits: Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
When to use: Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring hf_jobs()
⚠️ Important: The script parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.
Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
Common mistakes:
# ❌ These will all fail
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})
Correct approaches:
# ✅ Inline code (recommended)
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
# ✅ From Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
# ✅ From GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
# ✅ From Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
To use local scripts: Upload to HF Hub first:
huggingface-cli repo create my-training-scripts --type model
huggingface-cli upload my-training-scripts ./train.py train.py
# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
TRL provides battle-tested scripts for all methods. Can be run from URLs:
hf_jobs("uv", {
"script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
"script_args": [
"--model_name_or_path", "Qwen/Qwen2.5-0.5B",
"--dataset_name", "trl-lib/Capybara",
"--output_dir", "my-model",
"--push_to_hub",
"--hub_model_id", "username/my-model"
],
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
The uv-scripts organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
Popular collections: ocr, classification, synthetic-data, vllm, dataset-creation
When the hf_jobs() MCP tool is unavailable, use the hf jobs CLI directly.
⚠️ CRITICAL: CLI Syntax Rules
# ✅ CORRECT syntax - flags BEFORE script URL
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
# ❌ WRONG - "run uv" instead of "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large
# ❌ WRONG - flags AFTER script URL (will be ignored!)
hf jobs uv run "https://example.com/train.py" --flavor a10g-large
# ❌ WRONG - "--secret" instead of "--secrets" (plural)
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
Key syntax rules:
hf jobs uv run (NOT hf jobs run uv)--flavor, --timeout, --secrets) must come BEFORE the script URL--secrets (plural), not --secretComplete CLI example:
hf jobs uv run \
--flavor a10g-large \
--timeout 2h \
--secrets HF_TOKEN \
"https://huggingface.co/user/repo/resolve/main/train.py"
Check job status via CLI:
hf jobs ps # List all jobs
hf jobs logs <job-id> # View logs
hf jobs inspect <job-id> # Job details
hf jobs cancel <job-id> # Cancel a job
The trl-jobs package provides optimized defaults and one-liner training.
# Install
pip install trl-jobs
# Train with SFT (simplest possible)
trl-jobs sft \
--model_name Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/Capybara
Benefits: Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands When to use: User working in terminal directly (not Claude Code context), quick local experimentation Repository: https://github.com/huggingface/trl-jobs
⚠️ In Claude Code context, prefer using hf_jobs() MCP tool (Approach 1) when available.
| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |
|---|---|---|---|
| <1B params | t4-small | ~$0.75 | Demos, quick tests only without eval steps |
| 1-3B params | t4-medium, l4x1 | ~$1.50-2.50 | Development |
| 3-7B params | a10g-small, a10g-large | ~$3.50-5.00 | Production training |
| 7-13B params | a10g-large, a100-large | ~$5-10 | Large models (use LoRA) |
| 13B+ params | a100-large, a10g-largex2 | ~$10-20 | Very large (use LoRA) |
GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
Guidelines:
See: references/hardware_guide.md for detailed specifications
⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB
The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.
In training script/config:
SFTConfig(
push_to_hub=True,
hub_model_id="username/model-name", # MUST specify
hub_strategy="every_save", # Optional: push checkpoints
)
In job submission:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
}
Before submitting:
push_to_hub=True set in confighub_model_id includes username/repo-namesecrets parameter includes HF_TOKENSee: references/hub_saving.md for detailed troubleshooting
⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING
{
"timeout": "2h" # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}
| Scenario | Recommended | Notes |
|---|---|---|
| Quick demo (50-100 examples) | 10-30 min | Verify setup |
| Development training | 1-2 hours | Small datasets |
| Production (3-7B model) | 4-6 hours | Full datasets |
| Large model with LoRA | 3-6 hours | Depends on dataset |
Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning
Offer to estimate cost when planning jobs with known parameters. Use scripts/estimate_cost.py:
python scripts/estimate_cost.py \
--model meta-llama/Llama-2-7b-hf \
--dataset trl-lib/Capybara \
--hardware a10g-large \
--dataset-size 16000 \
--epochs 3
Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
Production-ready templates with all best practices:
Load these scripts for correctly:
scripts/train_sft_example.py - Complete SFT training with Trackio, LoRA, checkpointsscripts/train_dpo_example.py - DPO training for preference learningscripts/train_grpo_example.py - GRPO training for online RLThese scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to hf_jobs() or use as templates for custom scripts.
Trackio provides real-time metrics visualization. See references/trackio_guide.md for complete setup guide.
Key points:
trackio to dependenciesreport_to="trackio" and run_name="meaningful_name"Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:
Default Configuration:
{username}/trackio (use "trackio" as default space name)User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
See references/trackio_guide.md for complete documentation including grouping runs for experiments.
# List all jobs
hf_jobs("ps")
# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})
# View logs
hf_jobs("logs", {"job_id": "your-job-id"})
Remember: Wait for user to request status checks. Avoid polling repeatedly.
Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.
prompt, chosen, rejected)ALWAYS validate for:
Skip validation for known TRL datasets:
trl-lib/ultrachat_200k, trl-lib/Capybara, HuggingFaceH4/ultrachat_200k, etc.hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
The script is fast, and will usually complete synchronously.
The output shows compatibility for each training method:
✓ READY - Dataset is compatible, use directly✗ NEEDS MAPPING - Compatible but needs preprocessing (mapping code provided)✗ INCOMPATIBLE - Cannot be used for this methodWhen mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.
# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})
# 2. Check output markers:
# ✓ READY → proceed with training
# ✗ NEEDS MAPPING → apply mapping code below
# ✗ INCOMPATIBLE → choose different method/dataset
# 3. If mapping needed, apply before training:
def format_for_dpo(example):
return {
'prompt': example['instruction'],
'chosen': example['chosen_response'],
'rejected': example['rejected_response'],
}
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
# 4. Launch training job with confidence
Most DPO datasets use non-standard column names. Example:
Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected
The validator detects this and provides exact mapping code to fix it.
After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
What is GGUF:
When to convert:
See: references/gguf_conversion.md for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
Quick conversion:
hf_jobs("uv", {
"script": "<see references/gguf_conversion.md for complete script>",
"flavor": "a10g-large",
"timeout": "45m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
"env": {
"ADAPTER_MODEL": "username/my-finetuned-model",
"BASE_MODEL": "Qwen/Qwen2.5-0.5B",
"OUTPUT_REPO": "username/my-model-gguf"
}
})
See references/training_patterns.md for detailed examples including:
Fix (try in order):
per_device_train_batch_size=1, increase gradient_accumulation_steps=8. Effective batch size is per_device_train_batch_size x gradient_accumulation_steps. For best performance keep effective batch size close to 128.gradient_checkpointing=TrueFix:
uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
--dataset name --split train
Fix:
hf_jobs("logs", {"job_id": "..."})"timeout": "3h" (add 30% to estimated time)num_train_epochs, use smaller dataset, enable max_stepssave_strategy="steps", save_steps=500, hub_strategy="every_save"Note: Default 30min is insufficient for real training. Minimum 1-2 hours.
Fix:
secrets={"HF_TOKEN": "$HF_TOKEN"}push_to_hub=True, hub_model_id="username/model-name"mcp__huggingface__hf_whoami()hub_private_repo=True)Fix: Add to PEP 723 header:
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///
Common issues:
mcp__huggingface__hf_whoami(), token permissions, secrets parameterSee: references/troubleshooting.md for complete troubleshooting guide
references/training_methods.md - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modelingreferences/training_patterns.md - Common training patterns and examplesreferences/gguf_conversion.md - Complete GGUF conversion guidereferences/trackio_guide.md - Trackio monitoring setupreferences/hardware_guide.md - Hardware specs and selectionreferences/hub_saving.md - Hub authentication troubleshootingreferences/troubleshooting.md - Common issues and solutionsscripts/train_sft_example.py - Production SFT templatescripts/train_dpo_example.py - Production DPO templatescripts/train_grpo_example.py - Production GRPO templatescripts/estimate_cost.py - Estimate time and cost (offer when appropriate)scripts/convert_to_gguf.py - Complete GGUF conversion scriptuv run or hf_jobs)script parameter accepts Python code directly; no file saving required unless user requestsscripts/estimate_cost.pyhf_jobs("uv", {...}) with inline scripts; TRL maintained scripts for standard training; avoid bash trl-jobs commands in Claude CodeCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.