From togetherai-skills
Orchestrates Together AI Batch API for high-volume asynchronous inference: prepares JSONL inputs, uploads files, creates jobs, polls status, downloads outputs. For bulk tasks like classification and data generation.
npx claudepluginhub togethercomputer/skillsThis skill uses the workspace's default tool permissions.
Use Together AI's Batch API for large offline workloads where latency is not the primary concern.
Processes thousands of documents asynchronously with Google's Gemini Batch API for cost-effective bulk LLM extraction. Enforces reading examples first and checklists to avoid production gotchas like flat metadata and correct param names.
Provides step-by-step guidance, best practices, and production-ready code/configurations for batch inference pipelines in ML deployment, covering model serving, MLOps, monitoring, and optimization.
Fine-tunes open-source models using Together AI's Python SDK and OpenAI-compatible API. Guides JSONL data prep, file upload, job creation, monitoring, and inference.
Share bugs, ideas, or general feedback.
Use Together AI's Batch API for large offline workloads where latency is not the primary concern.
Typical fits:
together-chat-completions for real-time requests or tool-calling appstogether-evaluations for managed LLM-as-a-judge workflowstogether-embeddings for retrieval-specific vector generationcustom_id and body.purpose="batch-api".input_file_id=... and the target endpoint.custom_id.together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".input_file_id, not legacy file parameters.custom_id stable and meaningful so result reconciliation is easy.client.batches.create() returns a wrapper; access the batch object via response.job (e.g., response.job.id). client.batches.retrieve() returns the batch object directly.max_tokens low (e.g., 4), use temperature: 0, and constrain the system prompt to return only the label. This minimizes output tokens and cost.