Help us improve
Share bugs, ideas, or general feedback.
From sagemaker-ai
Generates a Jupyter notebook that fine-tunes a base model using SageMaker serverless training jobs. Supports SFT, DPO, and RLVR trainers including Lambda reward function creation.
npx claudepluginhub awslabs/agent-plugins --plugin sagemaker-aiHow this skill is triggered — by the user, by Claude, or both
Slash command
/sagemaker-ai:finetuningThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before starting this workflow, verify:
Guides selection of a base model and fine-tuning technique (SFT, DPO, RLVR) by querying SageMaker Hub. Use when choosing a model or technique for fine-tuning.
Trains or fine-tunes language/vision models using TRL or Unsloth on Hugging Face Jobs cloud GPUs. Supports SFT, DPO, GRPO, reward modeling, and GGUF export for local deployment.
Train or fine-tune TRL language models on Hugging Face Jobs using SFT, DPO, GRPO, Reward Modeling, with GGUF export for local deployment.
Share bugs, ideas, or general feedback.
Before starting this workflow, verify:
A use_case_spec.md file exists
use-case-specification skill first, then resumeA fine-tuning technique (SFT, DPO, or RLVR) and base model have already been selected
finetuning-setup skill to collect what's missing, then resumeA base model name available on SageMakerHub has been identified
finetuning-setup skill to get itfinetuning-setup retrieves, as it may differ from other commonly used names for the same model.ipynb file with the complete notebook JSON, OR use notebook MCP tools (e.g., create_notebook, add_cell) if availableecho/cat piping to generate notebooks<project-dir>/notebooks/<project-name>.ipynb
## Fine-Tuning as a section divider before the new cells⏸ Wait for user.
Read the example notebook matching the finetuning strategy:
references/sft_example.mdreferences/dpo_example.mdreferences/rlvr_example.mdmeta-):
ACCEPT_EULA = False line from the config cellaccept_eula=ACCEPT_EULA, line from the trainer callmax_epochs and lr_warmup_ratioIn the 'Setup & Credentials' cell, populate:
BASE_MODEL
MODEL_PACKAGE_GROUP_NAME
use_case_spec.md if needed)[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}customer-support-chatbot-v1Save notebook
references/rlvr_reward_function.md section "Helping Users Create Lambda Functions"CUSTOM_REWARD_FUNCTION in the Notebook with the ARN of the reward function (either given directly by the user, or from the function generation code as evaluator.arn).meta-)
ACCEPT_EULA to True in the notebook after reviewing the license. NEVER set ACCEPT_EULA to True yourself for Meta/Llama models.ACCEPT_EULA variable and accept_eula parameter should already be omitted from the notebook (see Step 1.3).Display the following to the user:: I have updated your Jupyter Notebook with the finetuning code. If you run it cell by cell, you should be able to launch your SageMaker Training job. Training takes a while. Please monitor the progress and let me know when it's complete so I can help you get to the next step in your plan.
Wait for user's confirmation about training completion. Once the user has confirmed, you are free to move to the next step of the plan.
CRITICAL:
If the user wants to finetune a model they had already customized, follow the instructions in references/continuous_customization.md
rlvr_reward_function.md - Lambda reward function creation guide (RLVR only)templates/rlvr_reward_function_source_template.py - Lambda reward function source template for open-weights models (RLVR only)templates/nova_rlvr_reward_function_source_template.py - Lambda reward function source template for Nova 2.0 Lite (RLVR only)sft_example.md - Complete notebook template for Supervised Fine-Tuningdpo_example.md - Complete notebook template for Direct Preference Optimizationrlvr_example.md - Complete notebook template for Reinforcement Learning from Verifiable Rewardscontinuous_customization.md - Instructions on fine-tuning an already fine-tuned model.