From togetherai-skills
Deploys and manages single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Handles fine-tuned or uploaded models, hardware sizing, and lifecycle for stable production inference.
npx claudepluginhub togethercomputer/skillsThis skill uses the workspace's default tool permissions.
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and
Generates Jupyter notebook deploying LoRA fine-tuned Nova/OSS models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock.
Guides Together AI API integration for inference, fine-tuning, and model deployment using OpenAI-compatible clients and Python SDK.
Deploys custom ML models on fal.ai serverless infrastructure using fal.App class, GPU selection (T4/A10G/A100/H100), setup for model loading, @fal.endpoint decorators, scaling config, secrets, persistent volumes, and fal deploy/run commands.
Share bugs, ideas, or general feedback.
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and no shared serverless pool.
Typical fits:
together-chat-completions for serverless chat inferencetogether-dedicated-containers for custom runtimes or nonstandard inference pipelinestogether-gpu-clusters for raw infrastructure or cluster orchestrationtogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".model.