Help us improve
Share bugs, ideas, or general feedback.
From distil-cli
Trains task-specific small language models (SLMs) via the Distil Labs CLI, supporting knowledge distillation, data preparation, fine-tuning, evaluation, and deployment.
npx claudepluginhub distil-labs/distil-cli-skill --plugin distil-cliHow this skill is triggered — by the user, by Claude, or both
Slash command
/distil-cli:distil-cliThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Environment | What you can do |
references/api-reference.mdreferences/cli-reference.mdreferences/configuration.mdreferences/evaluation-metrics.mdreferences/getting-started.mdreferences/job-description-guide.mdreferences/model-catalog.mdreferences/mutations-guide.mdreferences/platform-overview.mdreferences/task-selection-guide.mdreferences/tasks/analyze-predictions.mdreferences/tasks/analyze-uploads.mdreferences/tasks/deployment-integration.mdreferences/tasks/maintain-run-log.mdreferences/tasks/polling-jobs.mdreferences/tasks/prepare-data/classification.mdreferences/tasks/prepare-data/closed-book-qa.mdreferences/tasks/prepare-data/multi-turn-tool-calling.mdreferences/tasks/prepare-data/open-book-qa.mdreferences/tasks/prepare-data/overview.mdConducts post-training LLM research via the Tinker API: replicate papers, run experiments (SFT, RL, DPO, distillation), monitor metrics, and document findings.
Train or fine-tune TRL language models on Hugging Face Jobs using SFT, DPO, GRPO, Reward Modeling, with GGUF export for local deployment.
Train or fine-tune language and vision models using TRL (SFT, DPO, GRPO, reward modeling) or Unsloth on Hugging Face Jobs. Covers dataset prep, hardware selection, cost estimation, GGUF conversion for local deployment.
Share bugs, ideas, or general feedback.
| Environment | What you can do |
|---|---|
| Claude Code | Full end-to-end workflow: run CLI commands, prepare data, train, deploy |
| Claude.ai (browser) | Data preparation guidance only: help choose task types, create config files, format datasets. User runs CLI commands themselves. |
Windows users: The Distil CLI does not support Windows natively. Without WSL, users won't get the full experience — CLI commands won't run. They can fall back to the REST API via references/api-reference.md, or run the CLI inside WSL. Call this out early if the user is on Windows.
If the user wants to train a model end-to-end (any phrasing — "help me train a model", "build me an SLM", "I want to fine-tune for X"), do NOT answer ad hoc from the reference files. Instead:
model-building-log-<name>.md at the project root and write the opening entry (<name> = a descriptive slug for this project, which we'll later also use for distil model create). See references/tasks/maintain-run-log.md for format and append triggers. This fires once per project regardless of which workflow branch (dataset vs. traces) the user ends up on — individual workflow Step 0s do not duplicate the init.workflows/dataset-to-model.mdworkflows/traces-to-model.mdThe workflows encode the right sequence, decision points, and checkpoints (e.g., confirming config before kicking off a 6+ hour training job, approving a trace-generated test set). Skipping the workflow and answering Q&A-style usually leads to skipped steps and missing analysis reports.
For specific lookup questions ("what's the API endpoint for X", "what does parameter Y mean", "how do I download predictions"), continue to use the routing table below — those don't need a workflow and don't need a run log.
Route the user to the right reference file based on their intent. Read the referenced file BEFORE answering.
If multiple intents apply, read multiple files. For data preparation, always read both the overview and the task-specific file.
| User intent | Read this file |
|---|---|
| "Help me train a model" / "I want to build an SLM" / "Train a model for X" / "Fine-tune for Y" | First ask: dataset or traces? Then load the matching workflow below. |
| "Train from a dataset" / "dataset to model" / end-to-end with CSV data | workflows/dataset-to-model.md |
| "Train from production traces" / "traces to model" / end-to-end with traces | workflows/traces-to-model.md |
| "Model isn't good enough" / "How do I improve?" / "Retune" / iteration / "Teacher eval scored too low" | workflows/improving-a-model.md |
| User intent | Read this file |
|---|---|
| "I'm new" / "How do I get started?" / "Install the CLI" | references/getting-started.md |
| "What is distil labs?" / "What can it do?" / "How does it work?" | references/platform-overview.md |
| "How do I use command X?" / "What CLI commands are available?" | references/cli-reference.md |
| "How do I use the API?" / "REST API" / "API authentication" / "programmatic access" | references/api-reference.md |
| User intent | Read this file |
|---|---|
| "What task type should I pick?" / "Classification or QA?" / describing their use case | references/task-selection-guide.md |
| "Which model should I use?" / "What student/teacher models are available?" / "Can I use model X?" | references/model-catalog.md |
| "How do I write a good task/input description?" / "What goes in job_description.json?" | references/job-description-guide.md |
Always read references/tasks/prepare-data/overview.md first, then the task-specific file.
| User intent | Read this file (after overview.md) |
|---|---|
| General data prep / "What files do I need?" | references/tasks/prepare-data/overview.md (just this one) |
| Question answering data | references/tasks/prepare-data/question-answering.md |
| Classification data | references/tasks/prepare-data/classification.md |
| Tool calling data | references/tasks/prepare-data/tool-calling.md |
| Multi-turn tool calling data | references/tasks/prepare-data/multi-turn-tool-calling.md |
| Open book QA / RAG data | references/tasks/prepare-data/open-book-qa.md |
| Closed book QA data | references/tasks/prepare-data/closed-book-qa.md |
| User intent | Read this file |
|---|---|
| "How do I upload my dataset?" / "upload-data command" | references/tasks/upload-dataset.md |
| "How do I use production traces?" / "upload-traces" / "reprocess traces" | references/tasks/upload-and-process-traces.md |
| "How do I run teacher evaluation?" / "Is my task feasible?" | references/tasks/teacher-evaluation.md |
| "How do I train?" / "Start training" / "Training status" | references/tasks/training.md |
| "How do I deploy?" / "Download model" / "Run inference" | references/tasks/deployment-integration.md |
| "distil login" / "Credit balance is too low" / 401 errors / "/login doesn't work for distil" | references/tasks/verify-auth.md |
| "Download predictions" / "per-example results" / "inspect model outputs" | references/tasks/retrieve-predictions.md |
| "Analyze predictions" / "write analysis report" / "compare teacher and student" | references/tasks/analyze-predictions.md |
| "Log my work" / "track iterations" / "keep a history" / "log progress" | references/tasks/maintain-run-log.md |
| "Approve the test set" / "review test set" / "is my test data good?" | references/tasks/test-set-approval.md |
| "Check my upload" / "is my train/test consistent" / "data distribution" | references/tasks/analyze-uploads.md |
| "How do I poll a long-running job?" / "Wait for training" / "grep status doesn't work" | references/tasks/polling-jobs.md |
| User intent | Read this file |
|---|---|
| "How do I configure training?" / "config.yaml" / "tuning parameters" | references/configuration.md |
| "What are mutations?" / "Seed data doesn't cover all scenarios" / "Improve on specific domains" | references/mutations-guide.md |
| "What do these metrics mean?" / "How do I interpret results?" | references/evaluation-metrics.md |
The core flow is: create model → prepare data → upload → teacher evaluation → train → deploy. Teacher evaluation is a feasibility check — if the teacher can solve the task, the student will learn it. Training takes several hours and produces a downloadable model you can deploy locally (llama-cpp, vLLM) or via the platform.
For the longer write-up of what Distil Labs is and why, read references/platform-overview.md. For the list of supported task types, student models, and teacher models — always read references/model-catalog.md before recommending a specific model; availability changes over time.
Minimum steps to go from zero to a trained model. Read the relevant reference files for details on each step.
# 1. Install / update and authenticate
curl -fsSL https://cli-assets.distillabs.ai/install.sh | sh # first-time install
distil update # if already installed — the platform evolves quickly
distil login
# 2. Create a model
distil model create my-model-name
# Note the model ID from the output
# 3. Prepare data files in a directory:
# - job_description.json (task objectives)
# - config.yaml (task type, student model, teacher model)
# - train.csv (20+ labeled examples)
# - test.csv (held-out evaluation set)
# 4. Upload data
distil model upload-data <model-id> --data ./my-data-dir
# 5. Run teacher evaluation (feasibility check)
distil model run-teacher-evaluation <model-id>
distil model teacher-evaluation <model-id> # Check results
# Status values: JOB_NOT_STARTED, JOB_PENDING, JOB_RUNNING, JOB_SUCCESS, JOB_FAILURE, JOB_STOPPED
# 6. Train (long-running — confirm config with user before starting)
distil model run-training <model-id>
distil model training <model-id> # Check status
# 7. Download and deploy
distil model download <model-id>
distil model deploy local <model-id>
distil model invoke <model-id> # Get the curl command to query your model
Alternative: Train from traces — Instead of steps 3-4, prepare a traces.jsonl file with your production logs, a job_description.json, and a config.yaml, then run distil model upload-traces <model-id> --data ./my-traces-dir. See references/tasks/upload-and-process-traces.md for trace formats and task compatibility (note: question-answering-open-book is not supported via traces).
When helping users, exhaust all mechanical/lookup steps before engaging judgment. Check model compatibility constraints, file format requirements, and parameter defaults from the reference files first. Only then apply judgment for task selection, data quality assessment, or configuration tuning.
references/tasks/prepare-data/overview.md before any task-specific data file. The overview contains shared requirements (directory structure, min examples, file formats) that the task files assume you know.references/task-selection-guide.md and help them decide.references/model-catalog.md. Default recommendation: Llama-3.2-1B-Instruct as student, openai.gpt-oss-120b as teacher.job_description.json, read references/job-description-guide.md for what makes each field good and which fields apply to which task type. Note in particular that input_description is only used by question-answering — including it for other task types has no effect.distil model create, capture the model ID and use it in subsequent commands.upload-status, teacher-evaluation, training) to monitor progress.references/tasks/polling-jobs.md verbatim. Do not write your own grep loop or sleep N && command chain — the former picks the wrong status pattern half the time and the latter is blocked by Claude Code. Use while ...; do ...; sleep 60; done with the sleep inside the loop body.--output json | jq — never grep human-readable output. The default text output also omits some metrics (notably LLM-as-a-Judge), so for analysis always use --output json too.Read workflows/improving-a-model.md. It covers both iteration cases — teacher eval below thresholds, and training results that don't pass the DEPLOY bar — and consolidates the levers (job description, data, synthgen/mutations, student/teacher choice, tuning parameters, retune).
It also owns the iteration-N/ directory convention (one directory per attempt, reports written unsuffixed inside it) and the token-burn awareness rule for iteration #3+ (each iteration costs re-upload + teacher-eval credits + Claude analysis tokens — confirm the plan before racing into another attempt).
distil model = distil models = distil m — all three work identically.