Skill

together-chat-completions

Generates real-time streaming text via Together AI's OpenAI-compatible chat/completions API. Supports multi-turn chats, tool/function calling, structured JSON outputs, and reasoning models for building chatbots or debugging inference.

Python

Typescript

npx claudepluginhub togethercomputer/skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Use Together AI's serverless chat/completions API for interactive inference workloads:

Supporting Assets

agents/openai.yamlreferences/api-parameters.mdreferences/function-calling-patterns.mdreferences/models.mdreferences/reasoning-models.mdreferences/structured-outputs.mdscripts/async_parallel.pyscripts/chat_basic.pyscripts/chat_basic.tsscripts/debug_headers.pyscripts/debug_headers.tsscripts/reasoning_models.pyscripts/reasoning_models.tsscripts/structured_outputs.pyscripts/structured_outputs.tsscripts/tool_call_loop.pyscripts/tool_call_loop.ts

SKILL.md

Similar Skills

together-hello-world

1.9k

Runs Together AI inference for chat completions, streaming, images, and embeddings using Python or Node.js OpenAI-compatible clients. For testing open-source LLMs like Llama.

5 tools

together-pack

groq-core-workflow-a

1.9k

Provides TypeScript code for Groq chat completions, tool calling, JSON mode, and structured outputs. For building real-time AI chat interfaces and function calling with fast inference.

5 tools

groq-pack

llm-integration

133

Provides LLM integration patterns for function calling, streaming responses, Ollama local inference, and fine-tuning customization. Use for tool use, SSE streaming, local deployment, LoRA/QLoRA, or multi-provider APIs.

20 files5 tools

ork

Stats

Stars22

Forks4

Last CommitMar 30, 2026

Used By2 plugins

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Together Chat Completions

Overview

Use Together AI's serverless chat/completions API for interactive inference workloads:

basic text generation
streaming responses
multi-turn chat state
tool and function calling
structured outputs
reasoning-capable models

Treat this skill as the default entry point for Together AI text generation unless the task is clearly offline batch processing, vector retrieval, model training, or infrastructure management.

When This Skill Wins

Build a chatbot, assistant, or text-generation endpoint on Together AI
Add streaming output to a real-time user experience
Implement tool calling or function-calling loops
Constrain model output to JSON or a regex-defined shape
Choose between standard chat models and reasoning models
Debug request parameters, model behavior, or response shapes

Hand Off To Another Skill

Use together-batch-inference for large offline runs, backfills, or lower-cost asynchronous jobs
Use together-embeddings for vector search, semantic retrieval, or reranking
Use together-fine-tuning when the user wants to train or adapt a model
Use together-dedicated-endpoints when the user needs always-on single-tenant hosting
Use together-dedicated-containers or together-gpu-clusters for custom infrastructure

Quick Routing

Basic chat, streaming, or multi-turn state
- Start with references/api-parameters.md
- Use scripts/chat_basic.py or scripts/chat_basic.ts
OpenAI SDK migration, rate limits, or debug headers
- Read references/api-parameters.md
- Use scripts/debug_headers.py or scripts/debug_headers.ts
Parallel async requests
- Use scripts/async_parallel.py
Tool calling or function calling
- Read references/function-calling-patterns.md
- Start from scripts/tool_call_loop.py or scripts/tool_call_loop.ts
Structured outputs
- Read references/structured-outputs.md
- Start from scripts/structured_outputs.py or scripts/structured_outputs.ts
Reasoning models or thinking-mode toggles
- Read references/reasoning-models.md
- Start from scripts/reasoning_models.py or scripts/reasoning_models.ts
Combining tools + structured output, or tools + streaming
- Read the "Combining Tool Calls with Structured Output" section in references/function-calling-patterns.md
- Read the "Streaming Structured Output" section in references/structured-outputs.md
Model selection, context length, or pricing-aware choices
- Read references/models.md

Workflow

Confirm that the workload is interactive serverless inference rather than batch, retrieval, or training.
Pick the smallest model that satisfies latency, quality, and context requirements.
Decide whether the job needs plain text, tools, structured output, or reasoning.
Start from the matching script instead of re-deriving request shapes from scratch.
Pull deeper details from the relevant reference file only when needed.

High-Signal Rules

Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
Use client.chat.completions.create() for Python and client.chat.completions.create() for TypeScript.
Preserve full messages history for multi-turn conversations; do not rebuild context from final text only.
For tools, implement the full loop: model tool call -> execute tool -> append tool result -> second model call.
Prefer json_schema over looser JSON modes when the user needs stable machine-readable output.
Use reasoning models only when the task benefits from deeper deliberation; otherwise prefer cheaper standard models.
To combine tool calling with structured output, use a two-phase approach: Phase 1 sends tools (no response_format), Phase 2 sends response_format (no tools) after tool results are appended.
Streaming works with response_format; accumulate chunks and parse the final concatenated string as JSON.
If the user needs many independent requests, combine this skill with async_parallel.py or hand off to batch inference.

Resource Map

Parameters and response fields: references/api-parameters.md
OpenAI compatibility, rate-limit headers, and debug headers: references/api-parameters.md
Function-calling patterns: references/function-calling-patterns.md
Structured outputs: references/structured-outputs.md
Reasoning models: references/reasoning-models.md
Model catalog: references/models.md

Scripts

scripts/chat_basic.py and scripts/chat_basic.ts: basic chat, streaming, and multi-turn state
scripts/debug_headers.py and scripts/debug_headers.ts: raw-response inspection for routing, latency, and rate-limit headers
scripts/async_parallel.py: async Python fan-out for independent requests
scripts/tool_call_loop.py and scripts/tool_call_loop.ts: full tool-call loop
scripts/structured_outputs.py and scripts/structured_outputs.ts: schema-guided and regex outputs
scripts/reasoning_models.py and scripts/reasoning_models.ts: reasoning fields, effort, and hybrid toggles

Official Docs

Together Chat Completions

Overview

Use Together AI's serverless chat/completions API for interactive inference workloads:

basic text generation

streaming responses

multi-turn chat state

tool and function calling

structured outputs

reasoning-capable models

Treat this skill as the default entry point for Together AI text generation unless the task is clearly offline batch processing, vector retrieval, model training, or infrastructure management.

When This Skill Wins

Build a chatbot, assistant, or text-generation endpoint on Together AI

Add streaming output to a real-time user experience

Implement tool calling or function-calling loops

Constrain model output to JSON or a regex-defined shape

Choose between standard chat models and reasoning models

Debug request parameters, model behavior, or response shapes

Hand Off To Another Skill

Use together-batch-inference for large offline runs, backfills, or lower-cost asynchronous jobs

Use together-embeddings for vector search, semantic retrieval, or reranking

Use together-fine-tuning when the user wants to train or adapt a model

Use together-dedicated-endpoints when the user needs always-on single-tenant hosting

Use together-dedicated-containers or together-gpu-clusters for custom infrastructure

Quick Routing

Workflow

Confirm that the workload is interactive serverless inference rather than batch, retrieval, or training.

Pick the smallest model that satisfies latency, quality, and context requirements.

Decide whether the job needs plain text, tools, structured output, or reasoning.

Start from the matching script instead of re-deriving request shapes from scratch.

Pull deeper details from the relevant reference file only when needed.

High-Signal Rules

Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".

Use client.chat.completions.create() for Python and client.chat.completions.create() for TypeScript.

Preserve full messages history for multi-turn conversations; do not rebuild context from final text only.

For tools, implement the full loop: model tool call -> execute tool -> append tool result -> second model call.

Prefer json_schema over looser JSON modes when the user needs stable machine-readable output.

Use reasoning models only when the task benefits from deeper deliberation; otherwise prefer cheaper standard models.

To combine tool calling with structured output, use a two-phase approach: Phase 1 sends tools (no response_format), Phase 2 sends response_format (no tools) after tool results are appended.

Streaming works with response_format; accumulate chunks and parse the final concatenated string as JSON.

If the user needs many independent requests, combine this skill with async_parallel.py or hand off to batch inference.

Resource Map

Scripts

Official Docs