From togetherai-skills
Generates real-time streaming text via Together AI's OpenAI-compatible chat/completions API. Supports multi-turn chats, tool/function calling, structured JSON outputs, and reasoning models for building chatbots or debugging inference.
npx claudepluginhub togethercomputer/skillsThis skill uses the workspace's default tool permissions.
Use Together AI's serverless chat/completions API for interactive inference workloads:
agents/openai.yamlreferences/api-parameters.mdreferences/function-calling-patterns.mdreferences/models.mdreferences/reasoning-models.mdreferences/structured-outputs.mdscripts/async_parallel.pyscripts/chat_basic.pyscripts/chat_basic.tsscripts/debug_headers.pyscripts/debug_headers.tsscripts/reasoning_models.pyscripts/reasoning_models.tsscripts/structured_outputs.pyscripts/structured_outputs.tsscripts/tool_call_loop.pyscripts/tool_call_loop.tsRuns Together AI inference for chat completions, streaming, images, and embeddings using Python or Node.js OpenAI-compatible clients. For testing open-source LLMs like Llama.
Provides TypeScript code for Groq chat completions, tool calling, JSON mode, and structured outputs. For building real-time AI chat interfaces and function calling with fast inference.
Provides LLM integration patterns for function calling, streaming responses, Ollama local inference, and fine-tuning customization. Use for tool use, SSE streaming, local deployment, LoRA/QLoRA, or multi-provider APIs.
Share bugs, ideas, or general feedback.
Use Together AI's serverless chat/completions API for interactive inference workloads:
Treat this skill as the default entry point for Together AI text generation unless the task is clearly offline batch processing, vector retrieval, model training, or infrastructure management.
together-batch-inference for large offline runs, backfills, or lower-cost asynchronous jobstogether-embeddings for vector search, semantic retrieval, or rerankingtogether-fine-tuning when the user wants to train or adapt a modeltogether-dedicated-endpoints when the user needs always-on single-tenant hostingtogether-dedicated-containers or together-gpu-clusters for custom infrastructuretogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".client.chat.completions.create() for Python and client.chat.completions.create() for TypeScript.messages history for multi-turn conversations; do not rebuild context from final text only.json_schema over looser JSON modes when the user needs stable machine-readable output.tools (no response_format), Phase 2 sends response_format (no tools) after tool results are appended.response_format; accumulate chunks and parse the final concatenated string as JSON.async_parallel.py or hand off to batch inference.