This skill provides comprehensive knowledge for Ollama integration in the Ahling Command Center, including model management, GPU optimization for AMD RX 7900 XTX, custom Modelfiles, and multi-model orchestration.
/plugin marketplace add Lobbi-Docs/claude/plugin install ahling-command-center@claude-orchestrationThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill provides comprehensive knowledge for Ollama integration in the Ahling Command Center, including model management, GPU optimization for AMD RX 7900 XTX, custom Modelfiles, and multi-model orchestration.
target_gpu: AMD RX 7900 XTX
vram_total: 24GB
rocm_version: ">=6.0"
vram_allocation:
primary_model: 16GB # Large models (70B Q4, 34B)
secondary_model: 4GB # Fast models (7B, 3B)
embeddings: 2GB # nomic-embed-text
reserved: 2GB # Whisper, Frigate overlap
Route requests to appropriate models based on task complexity:
| Task Type | Model | VRAM | Use Case |
|---|---|---|---|
| Complex Reasoning | llama3.2:70b-q4 | 16GB | Planning, analysis, code review |
| Quick Response | llama3.2:7b | 4GB | Simple queries, fast interactions |
| Code Generation | codellama:34b-q4 | 12GB | Code writing, debugging |
| Home Assistant | fixt/home-3b-v3 | 2GB | HA entity control, automation |
| Embeddings | nomic-embed-text | 1GB | Vector generation for RAG |
| Vision | llava:13b | 8GB | Image analysis (when needed) |
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:70b",
"prompt": "Explain quantum computing",
"stream": false,
"options": {
"temperature": 0.7,
"num_ctx": 8192,
"num_gpu": 99
}
}'
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2:7b",
"messages": [
{"role": "system", "content": "You are the Ahling Command Center AI."},
{"role": "user", "content": "What is the status of my home?"}
],
"stream": true
}'
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Home Assistant automation for motion-activated lights"
}'
# List models
curl http://localhost:11434/api/tags
# Pull model
curl http://localhost:11434/api/pull -d '{"name": "llama3.2:70b"}'
# Delete model
curl http://localhost:11434/api/delete -d '{"name": "old-model"}'
# Show model info
curl http://localhost:11434/api/show -d '{"name": "llama3.2:70b"}'
# Modelfile.ahling-home
FROM fixt/home-3b-v3
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
PARAMETER stop "<|im_end|>"
SYSTEM """
You are the Ahling Command Center AI, integrated with Home Assistant.
You control a smart home with these capabilities:
- Lights in all rooms (living room, bedroom, office, kitchen, garage)
- Climate control (HVAC, fans)
- Security (cameras, locks, motion sensors)
- Media (TV, speakers)
- Energy monitoring (solar, battery, consumption)
When asked to control devices, respond with the exact service call needed.
Be concise and action-oriented.
"""
# Modelfile.ahling-coordinator
FROM llama3.2:7b
PARAMETER temperature 0.8
PARAMETER num_ctx 8192
PARAMETER top_p 0.9
SYSTEM """
You are the Ahling Command Center Coordinator, responsible for:
1. Orchestrating multi-agent workflows
2. Synthesizing information from multiple sources
3. Making decisions that affect the entire home system
4. Providing morning briefings and status reports
You have access to:
- Home Assistant for physical control
- Knowledge graph (Neo4j) for context
- Vector database (Qdrant) for semantic search
- Multiple specialist agents
Always think step-by-step and explain your reasoning.
"""
# ROCm for AMD GPU
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export OLLAMA_NUM_GPU=99
export OLLAMA_GPU_OVERHEAD=256m
export OLLAMA_MAX_LOADED_MODELS=3
# Memory optimization
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
services:
ollama:
image: ollama/ollama:rocm
container_name: ollama
devices:
- /dev/kfd
- /dev/dri
volumes:
- ollama_data:/root/.ollama
- ./modelfiles:/modelfiles
environment:
- HSA_OVERRIDE_GFX_VERSION=11.0.0
- OLLAMA_NUM_GPU=99
- OLLAMA_FLASH_ATTENTION=1
ports:
- "11434:11434"
group_add:
- video
- render
security_opt:
- seccomp:unconfined
cap_add:
- SYS_PTRACE
class ModelRouter:
"""Route requests to appropriate Ollama models."""
ROUTING_TABLE = {
"complex": "llama3.2:70b", # Complex reasoning
"fast": "llama3.2:7b", # Quick responses
"code": "codellama:34b", # Code tasks
"home": "ahling-home", # Home Assistant
"embed": "nomic-embed-text", # Embeddings
"vision": "llava:13b", # Image analysis
}
def route(self, task_type: str, complexity: float = 0.5) -> str:
"""Select model based on task type and complexity."""
if task_type == "auto":
if complexity > 0.7:
return self.ROUTING_TABLE["complex"]
else:
return self.ROUTING_TABLE["fast"]
return self.ROUTING_TABLE.get(task_type, "llama3.2:7b")
# Maximum 3 models loaded simultaneously
# Priority order: home (always), fast (high), complex (on-demand)
model_priority:
1: ahling-home # Always loaded for HA control
2: llama3.2:7b # Fast responses, always ready
3: llama3.2:70b # Load on-demand for complex tasks
unload_strategy:
idle_timeout: 300 # Unload after 5 minutes idle
priority_keep: 2 # Keep top 2 priority models loaded
async def ha_control_with_ollama(user_request: str):
"""Process voice command through Ollama for HA control."""
# Use the home-optimized model
response = await ollama.chat(
model="ahling-home",
messages=[
{"role": "user", "content": user_request}
]
)
# Parse the service call from response
service_call = parse_ha_service(response["message"]["content"])
# Execute on Home Assistant
await ha.call_service(**service_call)
# Register Ollama as LLM backend for AutoGen
config_list = [
{
"model": "llama3.2:70b",
"base_url": "http://ollama:11434/v1",
"api_type": "ollama",
"api_key": "ollama", # Placeholder
}
]
# Create AutoGen agent with Ollama
coordinator = AssistantAgent(
name="coordinator",
llm_config={"config_list": config_list},
system_message="You are the Ahling Command Center coordinator..."
)
async def rag_with_ollama(query: str):
"""RAG query using Ollama embeddings and generation."""
# Generate query embedding
query_embedding = await ollama.embeddings(
model="nomic-embed-text",
prompt=query
)
# Search Qdrant
results = await qdrant.search(
collection="knowledge",
query_vector=query_embedding,
limit=5
)
# Generate response with context
context = "\n".join([r.payload["text"] for r in results])
response = await ollama.chat(
model="llama3.2:70b",
messages=[
{"role": "system", "content": f"Context:\n{context}"},
{"role": "user", "content": query}
]
)
return response["message"]["content"]
# Check ROCm installation
rocm-smi
# Verify device access
ls -la /dev/kfd /dev/dri
# Check Ollama GPU usage
curl http://localhost:11434/api/ps
# Unload unused models
curl http://localhost:11434/api/generate -d '{
"model": "large-model",
"keep_alive": 0
}'
# Check current VRAM usage
rocm-smi --showmeminfo vram
# Enable flash attention
export OLLAMA_FLASH_ATTENTION=1
# Reduce context length
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:70b",
"options": {"num_ctx": 4096}
}'
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.