From sagemaker-ai
Executes commands, transfers files to/from SageMaker HyperPod cluster nodes via AWS SSM scripts. Use for shell access, diagnostics, package installs without SSH.
npx claudepluginhub awslabs/agent-plugins --plugin sagemaker-aiThis skill uses the workspace's default tool permissions.
Target: `sagemaker-cluster:<CLUSTER_ID>_<GROUP_NAME>-<INSTANCE_ID>`
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Target: sagemaker-cluster:<CLUSTER_ID>_<GROUP_NAME>-<INSTANCE_ID>
CLUSTER_ID: Last segment of cluster ARN (NOT the cluster name). Extract via get-cluster-info.sh.GROUP_NAME: Instance group name — retrieve via list-nodes.sh.INSTANCE_ID: EC2 instance ID (e.g., i-0123456789abcdef0)Three scripts under scripts/. Resolve cluster info and nodes once, then execute per node.
scripts/get-cluster-info.sh CLUSTER_NAME [--region REGION]
# Output: {"cluster_id":"...","cluster_arn":"...","cluster_name":"...","region":"..."}
scripts/list-nodes.sh CLUSTER_NAME [--region REGION] [--instance-group GROUP] [--instance-id ID]
# Output: JSON array of ClusterNodeSummaries (InstanceId, InstanceGroupName, InstanceStatus, etc.)
list-cluster-nodes paginates at 100 nodes. This script handles pagination automatically.
# Execute — with pre-built target
scripts/ssm-exec.sh --target "sagemaker-cluster:CLUSTERID_GROUP-INSTANCEID" 'command' [--region REGION]
# Execute — with parts
scripts/ssm-exec.sh --cluster-id ID --group GROUP --instance-id INSTANCE_ID 'command' [--region REGION]
# Upload
scripts/ssm-exec.sh --target TARGET --upload LOCAL_PATH REMOTE_PATH [--region REGION]
# Read remote file
scripts/ssm-exec.sh --target TARGET --read REMOTE_PATH [--region REGION]
SSM start-session rate limit: 3 TPS per account. Plan batch size and delay accordingly.
aws ssm send-command does NOT support sagemaker-cluster: targets — only start-session works.
When the scripts aren't suitable, use aws ssm start-session directly with AWS-StartNonInteractiveCommand:
cat > /tmp/cmd.json << 'EOF'
{"command": ["bash -c 'echo hello && whoami'"]}
EOF
aws ssm start-session \
--target sagemaker-cluster:{CLUSTER_ID}_{GROUP_NAME}-{INSTANCE_ID} \
--region REGION \
--document-name AWS-StartNonInteractiveCommand \
--parameters file:///tmp/cmd.json
Always use a JSON file for --parameters — inline parameters break with special characters.
| Task | Command |
|---|---|
| Lifecycle logs | cat /var/log/provision/provisioning.log |
| Memory | free -h |
| Disk/mounts | df -h && lsblk |
| GPU status | nvidia-smi |
| GPU memory | nvidia-smi --query-gpu=memory.used,memory.total --format=csv |
| EFA/network | fi_info -p efa |
| CloudWatch agent | sudo systemctl status amazon-cloudwatch-agent |
| Top processes | ps aux --sort=-%mem | head -20 |
root.--document-name to get a shell.AWS-StartNonInteractiveCommand.