From ripple-env
Multi-model orchestration for ARIA. Activate when dispatching tasks to different LLM providers (OpenAI, local models, cloud providers) or when optimizing cost/latency tradeoffs.
npx claudepluginhub flexnetos/ripple-envThis skill uses the workspace's default tool permissions.
This skill enables ARIA to dispatch tasks to multiple LLM providers beyond Claude.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
This skill enables ARIA to dispatch tasks to multiple LLM providers beyond Claude.
| File | Purpose |
|---|---|
.claude/config/models.json | Model registry and routing rules |
.claude/config/env.template | API key template (copy to secure location) |
.claude/config/claude-code-router.template.json | Router config for hybrid mode |
.claude/mcp-servers.json | MCP server configurations |
All Claude calls redirect to Kimi K2. Simplest setup.
Use Claude as main orchestrator, Kimi K2 for subagents/background tasks. Requires claude-code-router.
Keep Claude for everything, use Kimi K2 only when explicitly requested in Task prompts.
# Built-in - no additional config needed
Task(subagent_type="general-purpose", model="opus", prompt="...")
Task(subagent_type="general-purpose", model="sonnet", prompt="...")
Task(subagent_type="general-purpose", model="haiku", prompt="...")
# Via aichat (configured in pixi.toml)
aichat -m openai:gpt-4-turbo "Your prompt here"
# Via direct API
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4-turbo", "messages": [{"role": "user", "content": "..."}]}'
# List available models
ollama list
# Run inference
ollama run llama3.2 "Your prompt here"
# Via API
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.2", "prompt": "..."}'
# Start LocalAI server
localai run --models-path ./models
# Query (OpenAI-compatible API)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3", "messages": [...]}'
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mixtral-8x7B-Instruct-v0.1
# Query (OpenAI-compatible)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "messages": [...]}'
# Using OpenAI SDK with Moonshot endpoint
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MOONSHOT_API_KEY"],
base_url="https://api.moonshot.ai/v1",
)
# Kimi K2 Instruct - best for agentic tasks
response = client.chat.completions.create(
model="kimi-k2-instruct",
messages=[{"role": "user", "content": "Analyze this codebase..."}],
temperature=0.6, # Recommended for K2
)
# Kimi K2 Thinking - step-by-step reasoning with tool use
response = client.chat.completions.create(
model="kimi-k2-thinking",
messages=[{"role": "user", "content": "Debug this complex issue..."}],
)
# Via curl (OpenAI-compatible)
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-instruct",
"messages": [{"role": "user", "content": "..."}],
"temperature": 0.6
}'
# Via aichat (if configured)
aichat -m moonshot:kimi-k2-instruct "Your prompt here"
# Access Kimi K2 and 100+ models via single API
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2",
"messages": [{"role": "user", "content": "..."}]
}'
routing_strategy:
# Use cheapest model that can handle the task
simple_tasks:
primary: claude.haiku # Fast, cheap
fallback: local.ollama # Free, offline
analysis_tasks:
primary: claude.sonnet # Balanced
fallback: openai.gpt4 # Alternative
complex_tasks:
primary: claude.opus # Best reasoning
fallback: openai.gpt4 # Alternative
# Example: Get multiple perspectives on architecture decision
# Launch tasks to different models in parallel
# Claude perspective
Task(subagent_type="general-purpose", model="sonnet",
prompt="Analyze this architecture from security standpoint...")
# OpenAI perspective (via Bash + aichat)
Bash(command='aichat -m openai:gpt-4 "Analyze this architecture..."')
# Local model (via Bash + ollama)
Bash(command='ollama run llama3.2 "Analyze this architecture..."')
# Get agreement from multiple models before proceeding
models = ["claude.sonnet", "openai.gpt4", "local.llama3"]
responses = []
for model in models:
response = query_model(model, prompt)
responses.append(response)
# Require 2/3 agreement for critical decisions
consensus = check_consensus(responses, threshold=0.66)
Use Claude as the main orchestrator and Kimi K2 for subagent/background tasks.
npm install -g claude-code-router
mkdir -p ~/.claude-code-router
cp .claude/config/claude-code-router.template.json ~/.claude-code-router/config.json
# Edit with your API keys
{
"providers": [
{
"name": "anthropic",
"api_base_url": "https://api.anthropic.com",
"api_key": "$ANTHROPIC_API_KEY",
"models": ["claude-opus-4-5-20251101", "claude-sonnet-4-20250514"]
},
{
"name": "moonshot",
"api_base_url": "https://api.moonshot.ai/anthropic",
"api_key": "$MOONSHOT_API_KEY",
"models": ["kimi-k2-thinking-turbo", "kimi-k2-instruct"]
}
],
"router": {
"default": "anthropic,claude-sonnet-4-20250514",
"think": "anthropic,claude-opus-4-5-20251101",
"background": "moonshot,kimi-k2-thinking-turbo"
}
}
# Activate router (sets ANTHROPIC_BASE_URL to local proxy)
ccr activate
# Run Claude Code - it will route to different models based on task
claude
In Task prompts, add at the beginning:
Task(subagent_type="general-purpose",
prompt="""<CCR-SUBAGENT-MODEL>moonshot,kimi-k2-thinking</CCR-SUBAGENT-MODEL>
Analyze this codebase for security vulnerabilities...""")
# Option A: Project-local (gitignored)
cp .claude/config/env.template .claude/config/env
# Option B: User-global
cp .claude/config/env.template ~/.claude/env
# Edit with your actual API keys
# Add to ~/.bashrc or ~/.zshrc
ENV_FILE="${HOME}/.claude/env"
[ -f ".claude/config/env" ] && ENV_FILE=".claude/config/env"
if [ -f "$ENV_FILE" ]; then
set -a
source "$ENV_FILE"
set +a
fi
# Add to env file to use Kimi K2 instead of Claude:
export ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
export ANTHROPIC_AUTH_TOKEN=${MOONSHOT_API_KEY}
export ANTHROPIC_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_OPUS_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_SONNET_MODEL=kimi-k2-thinking-turbo
export ANTHROPIC_DEFAULT_HAIKU_MODEL=kimi-k2-thinking-turbo
export CLAUDE_CODE_SUBAGENT_MODEL=kimi-k2-thinking-turbo
# Then run Claude Code normally - it will use Kimi K2!
claude
# Check Claude
echo "Claude: ${ANTHROPIC_API_KEY:0:10}..."
# Check OpenAI
echo "OpenAI: ${OPENAI_API_KEY:0:10}..."
# Check Moonshot
echo "Moonshot: ${MOONSHOT_API_KEY:0:10}..."
# Check local models
ollama list
curl -s http://localhost:8080/v1/models | jq . # LocalAI
| Task Type | Recommended | Fallback | Reason |
|---|---|---|---|
| Orchestration | claude.opus | moonshot.kimi_k2 | Best reasoning |
| Agentic tasks | moonshot.kimi_k2 | claude.sonnet | 1T MoE, tool use optimized |
| Deep reasoning | moonshot.kimi_k2_thinking | claude.opus | Step-by-step with tools |
| Code review | claude.sonnet | moonshot.kimi_k2 | Code understanding |
| Coding tasks | moonshot.kimi_k2 | claude.sonnet | Strong coding benchmark |
| Documentation | claude.haiku | local.llama3 | Cost-effective |
| Vision/Images | openai.gpt4o | claude.sonnet | Multimodal |
| Private data | local.ollama | local.localai | Data stays local |
| High volume | local.vllm | local.localai | Throughput |
| Offline | local.ollama | - | No internet |
# Check if service is running
curl -s http://localhost:11434/api/tags # Ollama
curl -s http://localhost:8080/v1/models # LocalAI
# Check API key
curl -s https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY" | jq .error
# Use quantized models locally
ollama pull llama3.2:3b-instruct-q4_0 # Smaller, faster
# Or use vLLM for batching
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3-8b-instruct \
--tensor-parallel-size 2 # Multi-GPU