Skill

model-deployment

Generates Jupyter notebook deploying LoRA fine-tuned Nova/OSS models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock.

AWS

Python

deployment

ai-ml

npx claudepluginhub awslabs/agent-plugins --plugin sagemaker-ai

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Identifies the correct deployment pathway based on model characteristics and generates deployment code.

Supporting Assets

references/deploy-nova-bedrock.mdreferences/deploy-nova-sagemaker.mdreferences/deploy-oss-bedrock.mdreferences/deploy-oss-sagemaker.mdreferences/model-licenses.mdscripts/deploy-nova-bedrock.pyscripts/deploy-nova-sagemaker.pyscripts/deploy-oss-bedrock.pyscripts/deploy-oss-sagemaker.py

SKILL.md

Similar Skills

design-system

167.4k

Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.

team-skills-platform

ui-demo

167.4k

Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.

team-skills-platform

kotlin-patterns

167.4k

Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.

team-skills-platform

Stats

Parent Repo Stars461

Parent Repo Forks57

Last CommitApr 1, 2026

Actions

View Source View Plugin View on GitHub View README

Model Deployment

Identifies the correct deployment pathway based on model characteristics and generates deployment code.

Scope

This skill supports deploying Nova and OSS models that were fine-tuned through SageMaker Serverless Model Customization only.

Not supported:

Base models (not fine-tuned)
Models fine-tuned through other processes
Full Fine-Tuning (FFT) — only LoRA fine-tuned models are supported

Principles

One thing at a time. Each response advances exactly one decision.
Confirm before proceeding. Wait for the user to agree before moving on. But don't re-ask questions already answered in the conversation — use what you know.
Don't read files until you need them. Only read pathway references after the pathway is confirmed.
Use what you know. If conversation history or artifacts already answer a question, confirm your understanding instead of asking again.

Workflow

Step 1: Identify the Training Job

You need the training job name or ARN. Check the conversation history first — the user may have already mentioned it, or it may be available from earlier steps in the workflow (e.g., fine-tuning). If not, ask the user.

Once you have the training job name or ARN, use the AWS MCP tool to look it up:

Use the AWS MCP tool describe-training-job and extract:
- S3 output path (from ModelArtifacts.S3ModelArtifacts or OutputDataConfig.S3OutputPath)
- IAM role ARN (from RoleArn)
- Region
Use the AWS MCP tool list-tags on the training job ARN and extract:
- Model ID from the sagemaker-studio:jumpstart-model-id tag
Determine the model type from the model ID:
- Contains "nova" (nova-micro, nova-lite, nova-pro) → Nova
- Llama, Mistral, Qwen, GPT-OSS, DeepSeek, etc. → OSS

Unsupported models: This skill only supports OSS and Nova models that were LoRA fine-tuned through SageMaker Serverless Model Customization. If the model doesn't match, tell the user this skill can't help and suggest the finetuning skill.

Step 2: Determine Eligible Deployment Targets

Use the following table:

Model Type	Eligible Targets
OSS	SageMaker, Bedrock
Nova	SageMaker, Bedrock

If only one target is eligible, confirm it with the user. Use details from Step 5.

If multiple targets are eligible, help the user decide. Use details from Step 5.

If no targets are eligible, tell the user and explain why.

Step 3: Let the User Choose a Deployment Target

Present the eligible options to the user. Present these details to help them decide between SageMaker and Bedrock, if both are available options:

SageMaker Endpoint:

Dedicated compute resources for consistent performance
Control instance types and scaling
Best for predictable workloads with specific latency requirements

Bedrock:

Fully managed serverless inference
Auto-scales instantly with no capacity planning
Pay per request
Best for variable workloads with fluctuating demand

Do NOT make a recommendation. Let the user choose.

Do NOT mention technical details like merged/unmerged weights, reference files, or APIs, unless the user asks.

⏸ Wait for user to select a deployment option.

Step 4: Display License Agreement

Before proceeding to deployment, display the model's license or service terms to the user.

Read references/model-licenses.md and look up the model by its model ID (determined in Step 1).
Follow the instructions in the Notes column — use the exact phrasing provided.
If the model ID is not found in the table, warn the user that you could not find license information for their model and recommend they verify the license independently before proceeding.

⏸ Wait for the user to confirm before proceeding.

Step 5: Follow Pathway Workflow

Read the reference file for the selected pathway and follow its instructions.

Model Type	Deployment Target	Reference
OSS	SageMaker	`references/deploy-oss-sagemaker.md`
OSS	Bedrock	`references/deploy-oss-bedrock.md`
Nova	SageMaker	`references/deploy-nova-sagemaker.md`
Nova	Bedrock	`references/deploy-nova-bedrock.md`

Step 6: Post-Deployment Summary

After deployment completes, provide the user with a summary. Cover these topics, using details from the pathway reference doc you followed in Step 5:

What was deployed — endpoint or model name, ARN, status
How to use it — sample invoke code for the specific deployment target
Cost — billing model (instance-based vs. pay-per-request) and what to expect
Cleanup — how to delete the endpoint or model when done

Troubleshooting

How to check if a model was LoRA or FFT fine-tuned

If deployment fails unexpectedly, the model may have been full fine-tuned (FFT) rather than LoRA. To check, download the training job's hydra config from its S3 output path at .hydra/config.yaml:

peft_config populated (r, alpha, dropout, etc.) → LoRA (supported)
peft_config: null → FFT (not supported by this skill)