Skill

multi-model

Multi-model orchestration for ARIA. Activate when dispatching tasks to different LLM providers (OpenAI, local models, cloud providers) or when optimizing cost/latency tradeoffs.

npx claudepluginhub drdave-flexnetos/ripple-env

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ripple-env:multi-model

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill enables ARIA to dispatch tasks to multiple LLM providers beyond Claude.

SKILL.md

367 lines · ~2.6k tokens

Similar Skills

karpathy-guidelines

168.3k

Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.

andrej-karpathy-skills

skill-lookup

163.4k

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

skill-creator

150.3k

Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.

17 files

skill-creator

Stats

LanguageRust

Stars0

MaintenancePoor

Last CommitFeb 13, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Multi-Model Orchestration

This skill enables ARIA to dispatch tasks to multiple LLM providers beyond Claude.

Configuration Files

File	Purpose
`.claude/config/models.json`	Model registry and routing rules
`.claude/config/env.template`	API key template (copy to secure location)
`.claude/config/claude-code-router.template.json`	Router config for hybrid mode
`.claude/mcp-servers.json`	MCP server configurations

Multi-Model Modes

Mode A: Full Replacement (Kimi K2 replaces Claude)

All Claude calls redirect to Kimi K2. Simplest setup.

Mode B: Hybrid (Claude + Kimi K2 together) ⭐ Recommended

Use Claude as main orchestrator, Kimi K2 for subagents/background tasks. Requires claude-code-router.

Mode C: Subagent-only

Keep Claude for everything, use Kimi K2 only when explicitly requested in Task prompts.

Supported Providers

Native Claude (via Task tool)

# Built-in - no additional config needed
Task(subagent_type="general-purpose", model="opus", prompt="...")
Task(subagent_type="general-purpose", model="sonnet", prompt="...")
Task(subagent_type="general-purpose", model="haiku", prompt="...")

External Models (via MCP or Bash)

1. OpenAI (GPT-4, GPT-4o)

# Via aichat (configured in pixi.toml)
aichat -m openai:gpt-4-turbo "Your prompt here"

# Via direct API
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4-turbo", "messages": [{"role": "user", "content": "..."}]}'

2. Local Models (Ollama)

# List available models
ollama list

# Run inference
ollama run llama3.2 "Your prompt here"

# Via API
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "..."}'

3. LocalAI

# Start LocalAI server
localai run --models-path ./models

# Query (OpenAI-compatible API)
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [...]}'

4. vLLM (High-throughput)

# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mixtral-8x7B-Instruct-v0.1

# Query (OpenAI-compatible)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "messages": [...]}'

5. Moonshot Kimi K2 (OpenAI-compatible)

# Using OpenAI SDK with Moonshot endpoint
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["MOONSHOT_API_KEY"],
    base_url="https://api.moonshot.ai/v1",
)

# Kimi K2 Instruct - best for agentic tasks
response = client.chat.completions.create(
    model="kimi-k2-instruct",
    messages=[{"role": "user", "content": "Analyze this codebase..."}],
    temperature=0.6,  # Recommended for K2
)

# Kimi K2 Thinking - step-by-step reasoning with tool use
response = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[{"role": "user", "content": "Debug this complex issue..."}],
)

# Via curl (OpenAI-compatible)
curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2-instruct",
    "messages": [{"role": "user", "content": "..."}],
    "temperature": 0.6
  }'

# Via aichat (if configured)
aichat -m moonshot:kimi-k2-instruct "Your prompt here"

6. OpenRouter (Unified Gateway)

# Access Kimi K2 and 100+ models via single API
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2",
    "messages": [{"role": "user", "content": "..."}]
  }'

ARIA Multi-Model Dispatch Pattern

Cost-Optimized Routing

routing_strategy:
  # Use cheapest model that can handle the task
  simple_tasks:
    primary: claude.haiku      # Fast, cheap
    fallback: local.ollama     # Free, offline

  analysis_tasks:
    primary: claude.sonnet     # Balanced
    fallback: openai.gpt4      # Alternative

  complex_tasks:
    primary: claude.opus       # Best reasoning
    fallback: openai.gpt4      # Alternative

Parallel Multi-Model Execution

# Example: Get multiple perspectives on architecture decision
# Launch tasks to different models in parallel

# Claude perspective
Task(subagent_type="general-purpose", model="sonnet",
     prompt="Analyze this architecture from security standpoint...")

# OpenAI perspective (via Bash + aichat)
Bash(command='aichat -m openai:gpt-4 "Analyze this architecture..."')

# Local model (via Bash + ollama)
Bash(command='ollama run llama3.2 "Analyze this architecture..."')

Consensus Pattern

# Get agreement from multiple models before proceeding
models = ["claude.sonnet", "openai.gpt4", "local.llama3"]
responses = []

for model in models:
    response = query_model(model, prompt)
    responses.append(response)

# Require 2/3 agreement for critical decisions
consensus = check_consensus(responses, threshold=0.66)

Hybrid Mode Setup (Claude + Kimi K2)

Use Claude as the main orchestrator and Kimi K2 for subagent/background tasks.

1. Install claude-code-router

npm install -g claude-code-router

2. Create Router Config

mkdir -p ~/.claude-code-router
cp .claude/config/claude-code-router.template.json ~/.claude-code-router/config.json
# Edit with your API keys

3. Router Configuration

{
  "providers": [
    {
      "name": "anthropic",
      "api_base_url": "https://api.anthropic.com",
      "api_key": "$ANTHROPIC_API_KEY",
      "models": ["claude-opus-4-5-20251101", "claude-sonnet-4-20250514"]
    },
    {
      "name": "moonshot",
      "api_base_url": "https://api.moonshot.ai/anthropic",
      "api_key": "$MOONSHOT_API_KEY",
      "models": ["kimi-k2-thinking-turbo", "kimi-k2-instruct"]
    }
  ],
  "router": {
    "default": "anthropic,claude-sonnet-4-20250514",
    "think": "anthropic,claude-opus-4-5-20251101",
    "background": "moonshot,kimi-k2-thinking-turbo"
  }
}

4. Run with Hybrid Mode

# Activate router (sets ANTHROPIC_BASE_URL to local proxy)
ccr activate

# Run Claude Code - it will route to different models based on task
claude

5. Route Specific Subagents to Kimi K2

In Task prompts, add at the beginning:

Task(subagent_type="general-purpose",
     prompt="""<CCR-SUBAGENT-MODEL>moonshot,kimi-k2-thinking</CCR-SUBAGENT-MODEL>

     Analyze this codebase for security vulnerabilities...""")

Environment Setup

1. Copy API Key Template

# Option A: Project-local (gitignored)
cp .claude/config/env.template .claude/config/env

# Option B: User-global
cp .claude/config/env.template ~/.claude/env

# Edit with your actual API keys

2. Source Environment

# Add to ~/.bashrc or ~/.zshrc
ENV_FILE="${HOME}/.claude/env"
[ -f ".claude/config/env" ] && ENV_FILE=".claude/config/env"

if [ -f "$ENV_FILE" ]; then
  set -a
  source "$ENV_FILE"
  set +a
fi

3. Use Kimi K2 as Claude Replacement (Optional)

# Add to env file to use Kimi K2 instead of Claude:
export ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
export ANTHROPIC_AUTH_TOKEN=${MOONSHOT_API_KEY}
export ANTHROPIC_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_OPUS_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_SONNET_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_HAIKU_MODEL=kimi-k2-thinking-turbo
export CLAUDE_CODE_SUBAGENT_MODEL=kimi-k2-thinking-turbo

# Then run Claude Code normally - it will use Kimi K2!
claude

4. Verify Configuration

# Check Claude
echo "Claude: ${ANTHROPIC_API_KEY:0:10}..."

# Check OpenAI
echo "OpenAI: ${OPENAI_API_KEY:0:10}..."

# Check Moonshot
echo "Moonshot: ${MOONSHOT_API_KEY:0:10}..."

# Check local models
ollama list
curl -s http://localhost:8080/v1/models | jq .  # LocalAI

Model Selection Matrix

Task Type	Recommended	Fallback	Reason
Orchestration	claude.opus	moonshot.kimi_k2	Best reasoning
Agentic tasks	moonshot.kimi_k2	claude.sonnet	1T MoE, tool use optimized
Deep reasoning	moonshot.kimi_k2_thinking	claude.opus	Step-by-step with tools
Code review	claude.sonnet	moonshot.kimi_k2	Code understanding
Coding tasks	moonshot.kimi_k2	claude.sonnet	Strong coding benchmark
Documentation	claude.haiku	local.llama3	Cost-effective
Vision/Images	openai.gpt4o	claude.sonnet	Multimodal
Private data	local.ollama	local.localai	Data stays local
High volume	local.vllm	local.localai	Throughput
Offline	local.ollama	-	No internet

Kimi K2 Specs

Architecture: Mixture-of-Experts (MoE)
Total Parameters: 1 trillion
Active Parameters: 32 billion per forward pass
Strengths: Agentic tasks, reasoning, coding
API: OpenAI-compatible (https://api.moonshot.ai/v1)
Recommended temp: 0.6

Security Considerations

Never commit API keys - Use env files outside repo
Rotate keys regularly - Especially for production
Use local models for sensitive data - PII, proprietary code
Audit model access - Log which models see what data
Rate limit external APIs - Prevent cost overruns

Troubleshooting

Model not responding

# Check if service is running
curl -s http://localhost:11434/api/tags  # Ollama
curl -s http://localhost:8080/v1/models  # LocalAI

# Check API key
curl -s https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY" | jq .error

Slow inference

# Use quantized models locally
ollama pull llama3.2:3b-instruct-q4_0  # Smaller, faster

# Or use vLLM for batching
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3-8b-instruct \
  --tensor-parallel-size 2  # Multi-GPU

Related Skills

AI Assistants - aichat, aider configuration
Inference - LocalAI, vLLM setup
Observability - Model metrics monitoring

multi-model

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

multi-model

Invocation

Context Preview

SKILL.md

Multi-Model Orchestration

Configuration Files

Multi-Model Modes

Mode A: Full Replacement (Kimi K2 replaces Claude)

Mode B: Hybrid (Claude + Kimi K2 together) ⭐ Recommended

Mode C: Subagent-only

Supported Providers

Native Claude (via Task tool)

External Models (via MCP or Bash)

1. OpenAI (GPT-4, GPT-4o)

2. Local Models (Ollama)

3. LocalAI

4. vLLM (High-throughput)

5. Moonshot Kimi K2 (OpenAI-compatible)

6. OpenRouter (Unified Gateway)

ARIA Multi-Model Dispatch Pattern

Cost-Optimized Routing

Parallel Multi-Model Execution

Consensus Pattern

Hybrid Mode Setup (Claude + Kimi K2)

1. Install claude-code-router

2. Create Router Config

3. Router Configuration

4. Run with Hybrid Mode

5. Route Specific Subagents to Kimi K2

Environment Setup

1. Copy API Key Template

2. Source Environment

3. Use Kimi K2 as Claude Replacement (Optional)

4. Verify Configuration

Model Selection Matrix

Kimi K2 Specs

Security Considerations

Troubleshooting

Model not responding

Slow inference

Related Skills

Similar Skills

Help us improve

Multi-Model Orchestration

Configuration Files

Multi-Model Modes

Mode A: Full Replacement (Kimi K2 replaces Claude)

Mode B: Hybrid (Claude + Kimi K2 together) ⭐ Recommended

Mode C: Subagent-only

Supported Providers

Native Claude (via Task tool)

External Models (via MCP or Bash)

1. OpenAI (GPT-4, GPT-4o)

2. Local Models (Ollama)

3. LocalAI

4. vLLM (High-throughput)

5. Moonshot Kimi K2 (OpenAI-compatible)

6. OpenRouter (Unified Gateway)

ARIA Multi-Model Dispatch Pattern

Cost-Optimized Routing

Parallel Multi-Model Execution

Consensus Pattern

Hybrid Mode Setup (Claude + Kimi K2)

1. Install claude-code-router

2. Create Router Config

3. Router Configuration

4. Run with Hybrid Mode

5. Route Specific Subagents to Kimi K2

Environment Setup

1. Copy API Key Template

2. Source Environment

3. Use Kimi K2 as Claude Replacement (Optional)