together-batch-inference | togetherai-skills | ClaudePluginHub

Skill

together-batch-inference

From togetherai-skills

Orchestrates Together AI Batch API for high-volume asynchronous inference: prepares JSONL inputs, uploads files, creates jobs, polls status, downloads outputs. For bulk tasks like classification and data generation.

$

npx claudepluginhub togethercomputer/skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Use Together AI's Batch API for large offline workloads where latency is not the primary concern.

Supporting Assets

agents/openai.yamlreferences/api-reference.mdscripts/batch_workflow.pyscripts/batch_workflow.ts

SKILL.md

Similar Skills

gemini-batch

10

Processes thousands of documents asynchronously with Google's Gemini Batch API for cost-effective bulk LLM extraction. Enforces reading examples first and checklists to avoid production gotchas like flat metadata and correct param names.

12 files

batch-inference-pipeline

2.0k

Provides step-by-step guidance, best practices, and production-ready code/configurations for batch inference pipelines in ML deployment, covering model serving, MLOps, monitoring, and optimization.

5 tools

jeremylongshore-claude-code-plugins-plus-skills

together-core-workflow-a

1.9k

Fine-tunes open-source models using Together AI's Python SDK and OpenAI-compatible API. Guides JSONL data prep, file upload, job creation, monitoring, and inference.

5 tools

Stats

Stars22

Forks4

Last CommitMar 30, 2026

Used By2 plugins

Actions

View Source View Plugin View on GitHub View README

Tags

batch-inference

bulk-processing

asynchronous-inference

Help us improve

Share bugs, ideas, or general feedback.

Together Batch Inference

Overview

Use Together AI's Batch API for large offline workloads where latency is not the primary concern.

Typical fits:

bulk classification
synthetic data generation
dataset transformations
large summarization or enrichment jobs
low-cost asynchronous inference

When This Skill Wins

The user has many independent requests to run
A JSONL request file is acceptable
Turnaround time can be minutes or hours instead of seconds
Lower cost matters more than immediate interactivity

Hand Off To Another Skill

Use together-chat-completions for real-time requests or tool-calling apps
Use together-evaluations for managed LLM-as-a-judge workflows
Use together-embeddings for retrieval-specific vector generation

Quick Routing

End-to-end batch workflow
- Start with scripts/batch_workflow.py or scripts/batch_workflow.ts
Request format, status model, and result downloads
- Read references/api-reference.md
Operational guidance and batch sizing
- Read references/api-reference.md

Workflow

Build a JSONL file where each line contains custom_id and body.
Upload the file with purpose="batch-api".
Create the batch with input_file_id=... and the target endpoint.
Poll until the job is terminal.
Download output and error files, then reconcile by custom_id.

High-Signal Rules

Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
Use input_file_id, not legacy file parameters.
Keep custom_id stable and meaningful so result reconciliation is easy.
Batch is for independent requests. If the workload depends on shared conversation state, it is probably the wrong tool.
Always inspect the error file in addition to the success output.
client.batches.create() returns a wrapper; access the batch object via response.job (e.g., response.job.id). client.batches.retrieve() returns the batch object directly.
For classification or labeling workloads, set max_tokens low (e.g., 4), use temperature: 0, and constrain the system prompt to return only the label. This minimizes output tokens and cost.
Small batches (under 1K requests) typically complete in minutes. The 24-hour completion window is a maximum, not typical.

Resource Map

API reference and operational guidance: references/api-reference.md
Python workflow: scripts/batch_workflow.py
TypeScript workflow: scripts/batch_workflow.ts

Official Docs