Multi-model orchestration for ARIA. Activate when dispatching tasks to different LLM providers (OpenAI, local models, cloud providers) or when optimizing cost/latency tradeoffs.
Routes tasks to multiple LLM providers for cost optimization and specialized capabilities.
npx claudepluginhub flexnetos/ripple-envThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill enables ARIA to dispatch tasks to multiple LLM providers beyond Claude.
| File | Purpose |
|---|---|
.claude/config/models.json | Model registry and routing rules |
.claude/config/env.template | API key template (copy to secure location) |
.claude/config/claude-code-router.template.json | Router config for hybrid mode |
.claude/mcp-servers.json | MCP server configurations |
All Claude calls redirect to Kimi K2. Simplest setup.
Use Claude as main orchestrator, Kimi K2 for subagents/background tasks. Requires claude-code-router.
Keep Claude for everything, use Kimi K2 only when explicitly requested in Task prompts.
# Built-in - no additional config needed
Task(subagent_type="general-purpose", model="opus", prompt="...")
Task(subagent_type="general-purpose", model="sonnet", prompt="...")
Task(subagent_type="general-purpose", model="haiku", prompt="...")
# Via aichat (configured in pixi.toml)
aichat -m openai:gpt-4-turbo "Your prompt here"
# Via direct API
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4-turbo", "messages": [{"role": "user", "content": "..."}]}'
# List available models
ollama list
# Run inference
ollama run llama3.2 "Your prompt here"
# Via API
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.2", "prompt": "..."}'
# Start LocalAI server
localai run --models-path ./models
# Query (OpenAI-compatible API)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3", "messages": [...]}'
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mixtral-8x7B-Instruct-v0.1
# Query (OpenAI-compatible)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "messages": [...]}'
# Using OpenAI SDK with Moonshot endpoint
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MOONSHOT_API_KEY"],
base_url="https://api.moonshot.ai/v1",
)
# Kimi K2 Instruct - best for agentic tasks
response = client.chat.completions.create(
model="kimi-k2-instruct",
messages=[{"role": "user", "content": "Analyze this codebase..."}],
temperature=0.6, # Recommended for K2
)
# Kimi K2 Thinking - step-by-step reasoning with tool use
response = client.chat.completions.create(
model="kimi-k2-thinking",
messages=[{"role": "user", "content": "Debug this complex issue..."}],
)
# Via curl (OpenAI-compatible)
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-instruct",
"messages": [{"role": "user", "content": "..."}],
"temperature": 0.6
}'
# Via aichat (if configured)
aichat -m moonshot:kimi-k2-instruct "Your prompt here"
# Access Kimi K2 and 100+ models via single API
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2",
"messages": [{"role": "user", "content": "..."}]
}'
routing_strategy:
# Use cheapest model that can handle the task
simple_tasks:
primary: claude.haiku # Fast, cheap
fallback: local.ollama # Free, offline
analysis_tasks:
primary: claude.sonnet # Balanced
fallback: openai.gpt4 # Alternative
complex_tasks:
primary: claude.opus # Best reasoning
fallback: openai.gpt4 # Alternative
# Example: Get multiple perspectives on architecture decision
# Launch tasks to different models in parallel
# Claude perspective
Task(subagent_type="general-purpose", model="sonnet",
prompt="Analyze this architecture from security standpoint...")
# OpenAI perspective (via Bash + aichat)
Bash(command='aichat -m openai:gpt-4 "Analyze this architecture..."')
# Local model (via Bash + ollama)
Bash(command='ollama run llama3.2 "Analyze this architecture..."')
# Get agreement from multiple models before proceeding
models = ["claude.sonnet", "openai.gpt4", "local.llama3"]
responses = []
for model in models:
response = query_model(model, prompt)
responses.append(response)
# Require 2/3 agreement for critical decisions
consensus = check_consensus(responses, threshold=0.66)
Use Claude as the main orchestrator and Kimi K2 for subagent/background tasks.
npm install -g claude-code-router
mkdir -p ~/.claude-code-router
cp .claude/config/claude-code-router.template.json ~/.claude-code-router/config.json
# Edit with your API keys
{
"providers": [
{
"name": "anthropic",
"api_base_url": "https://api.anthropic.com",
"api_key": "$ANTHROPIC_API_KEY",
"models": ["claude-opus-4-5-20251101", "claude-sonnet-4-20250514"]
},
{
"name": "moonshot",
"api_base_url": "https://api.moonshot.ai/anthropic",
"api_key": "$MOONSHOT_API_KEY",
"models": ["kimi-k2-thinking-turbo", "kimi-k2-instruct"]
}
],
"router": {
"default": "anthropic,claude-sonnet-4-20250514",
"think": "anthropic,claude-opus-4-5-20251101",
"background": "moonshot,kimi-k2-thinking-turbo"
}
}
# Activate router (sets ANTHROPIC_BASE_URL to local proxy)
ccr activate
# Run Claude Code - it will route to different models based on task
claude
In Task prompts, add at the beginning:
Task(subagent_type="general-purpose",
prompt="""<CCR-SUBAGENT-MODEL>moonshot,kimi-k2-thinking</CCR-SUBAGENT-MODEL>
Analyze this codebase for security vulnerabilities...""")
# Option A: Project-local (gitignored)
cp .claude/config/env.template .claude/config/env
# Option B: User-global
cp .claude/config/env.template ~/.claude/env
# Edit with your actual API keys
# Add to ~/.bashrc or ~/.zshrc
ENV_FILE="${HOME}/.claude/env"
[ -f ".claude/config/env" ] && ENV_FILE=".claude/config/env"
if [ -f "$ENV_FILE" ]; then
set -a
source "$ENV_FILE"
set +a
fi
# Add to env file to use Kimi K2 instead of Claude:
export ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
export ANTHROPIC_AUTH_TOKEN=${MOONSHOT_API_KEY}
export ANTHROPIC_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_OPUS_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_SONNET_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_HAIKU_MODEL=kimi-k2-thinking-turbo
export CLAUDE_CODE_SUBAGENT_MODEL=kimi-k2-thinking-turbo
# Then run Claude Code normally - it will use Kimi K2!
claude
# Check Claude
echo "Claude: ${ANTHROPIC_API_KEY:0:10}..."
# Check OpenAI
echo "OpenAI: ${OPENAI_API_KEY:0:10}..."
# Check Moonshot
echo "Moonshot: ${MOONSHOT_API_KEY:0:10}..."
# Check local models
ollama list
curl -s http://localhost:8080/v1/models | jq . # LocalAI
| Task Type | Recommended | Fallback | Reason |
|---|---|---|---|
| Orchestration | claude.opus | moonshot.kimi_k2 | Best reasoning |
| Agentic tasks | moonshot.kimi_k2 | claude.sonnet | 1T MoE, tool use optimized |
| Deep reasoning | moonshot.kimi_k2_thinking | claude.opus | Step-by-step with tools |
| Code review | claude.sonnet | moonshot.kimi_k2 | Code understanding |
| Coding tasks | moonshot.kimi_k2 | claude.sonnet | Strong coding benchmark |
| Documentation | claude.haiku | local.llama3 | Cost-effective |
| Vision/Images | openai.gpt4o | claude.sonnet | Multimodal |
| Private data | local.ollama | local.localai | Data stays local |
| High volume | local.vllm | local.localai | Throughput |
| Offline | local.ollama | - | No internet |
# Check if service is running
curl -s http://localhost:11434/api/tags # Ollama
curl -s http://localhost:8080/v1/models # LocalAI
# Check API key
curl -s https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY" | jq .error
# Use quantized models locally
ollama pull llama3.2:3b-instruct-q4_0 # Smaller, faster
# Or use vLLM for batching
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3-8b-instruct \
--tensor-parallel-size 2 # Multi-GPU
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.