From togetherai-skills
Provisions and manages on-demand/reserved GPU clusters (H100, H200, B200) on Together AI with Kubernetes or Slurm orchestration, shared storage, credentials, and scaling for ML/HPC workloads.
How this skill is triggered — by the user, by Claude, or both
Slash command
/togetherai-skills:together-gpu-clustersThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use Together AI GPU clusters when the user needs infrastructure control instead of a managed
Use Together AI GPU clusters when the user needs infrastructure control instead of a managed inference product.
Typical fits:
together-dedicated-endpoints for managed single-model hostingtogether-dedicated-containers for containerized inference without owning the full clustertogether-sandboxes for short-lived remote Python executiontogether-fine-tuning for managed training jobs instead of raw cluster operationstogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".shared_volume over creating a volume separately and attaching via volume_id. Separately created volumes may land in a different datacenter partition than the cluster, causing a "does not exist in the datacenter" error even when the volume shows as available.list_regions() first and be prepared to try multiple regions.cuda_version and nvidia_driver_version as separate fields in addition to the combined driver_version string. Pass them via extra_body in the Python SDK.slurm.conf) are Slinky v1.0 only. A non-zero exit from a worker prolog or epilog drains the node, and calling Slurm commands (squeue, scontrol, sacctmgr) inside any prolog/epilog can deadlock the scheduler.npx claudepluginhub togethercomputer/skills --plugin togetherai-skillsLaunches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.
Generates and submits sbatch scripts for GPU compute jobs on Slurm clusters. Handles partition, GPU types (A100_40G, V100, A800), node selection, Python paths, and cluster rules.
Provides Vast.ai reference architecture for GPU compute workflows in ML training: three-tier orchestrator-workers-storage, Python job queues, Docker workers, and YAML configs.