LocalAI local inference API management via Podman Quadlet. Provides an OpenAI-compatible API for local model inference with GPU acceleration. Use when users need to configure, start, or manage the LocalAI service.
Manages LocalAI local inference service via Podman Quadlet with GPU acceleration. Use when users need to configure, start, or manage the LocalAI OpenAI-compatible API service.
/plugin marketplace add atrawog/bazzite-ai-plugins/plugin install bazzite-ai@bazzite-ai-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
The localai command manages the LocalAI service using Podman Quadlet containers. It provides an OpenAI-compatible API for running AI models locally with GPU acceleration.
Key Features:
bazzite-ai network| Action | Command | Description |
|---|---|---|
| Config | ujust localai config | Configure instance |
| Start | ujust localai start | Start service |
| Stop | ujust localai stop | Stop service |
| Restart | ujust localai restart | Restart service |
| Logs | ujust localai logs | View logs |
| Status | ujust localai status | Show status |
| URL | ujust localai url | Show API URL |
| List | ujust localai list | List instances |
| Shell | ujust localai shell | Container shell |
| Delete | ujust localai delete | Remove service |
All parameters use named syntax (e.g., PORT=8081):
| Parameter | Default | Description |
|---|---|---|
ACTION | (menu) | Action to perform |
PORT | 8080 | Host port for API |
IMAGE | (auto by GPU) | Container image |
BIND | 127.0.0.1 | Bind address (127.0.0.1 or 0.0.0.0) |
INSTANCE | 1 | Instance number or all |
LINES | 50 | Log lines to show |
CMD | (empty) | Shell command |
LocalAI uses different container images optimized for each GPU type:
| GPU Type | Image | Auto-Selected? |
|---|---|---|
| CPU (none) | localai/localai:latest | Yes |
| NVIDIA | localai/localai:latest-gpu-nvidia-cuda-12 | Yes |
| AMD | localai/localai:latest-gpu-hipblas | Yes |
| Intel | localai/localai:latest-gpu-intel | Yes |
The appropriate image is automatically selected based on detected GPU hardware.
# Default configuration (auto-detects GPU, port 8080)
ujust localai config
# Custom port
ujust localai config PORT=8081
# Network-wide access (0.0.0.0)
ujust localai config BIND=0.0.0.0
# Force CPU image (ignore GPU)
ujust localai config IMAGE=localai/localai:latest
# Combine parameters
ujust localai config PORT=8081 BIND=0.0.0.0
Running config when already configured updates the existing settings:
# Change only the bind address
ujust localai config BIND=0.0.0.0
# Update port without affecting other settings
ujust localai config PORT=8082
# Start LocalAI
ujust localai start
# Stop service
ujust localai stop
# Restart (apply config changes)
ujust localai restart
# View logs (default 50 lines)
ujust localai logs
# View more logs
ujust localai logs LINES=200
# Check status
ujust localai status
# Show API URL
ujust localai url
# Start all instances
ujust localai start INSTANCE=all
# Stop specific instance
ujust localai stop INSTANCE=2
# Delete all instances
ujust localai delete INSTANCE=all
# Interactive shell
ujust localai shell
# Run specific command
ujust localai shell CMD="ls -la /models"
ujust localai shell CMD="nvidia-smi"
LocalAI uses the bazzite-ai bridge network for cross-container DNS:
+-------------------+ DNS +-------------------+
| Open WebUI | -----------> | LocalAI |
| (openwebui) | | (localai) |
| Port 3000 | | Port 8080 |
+-------------------+ +-------------------+
| |
+------ bazzite-ai network --------+
|
+-------------------+ | +-------------------+
| Ollama |----+----+ Jupyter |
| (ollama) | | (jupyter) |
| Port 11434 | | Port 8888 |
+-------------------+ +-------------------+
Cross-Pod DNS:
http://localai:8080 from other containers| Endpoint | Description |
|---|---|
/v1/models | List available models |
/v1/chat/completions | Chat completions |
/v1/completions | Text completions |
/v1/embeddings | Generate embeddings |
/v1/images/generations | Image generation |
/v1/audio/transcriptions | Speech-to-text |
# List models
curl http://localhost:8080/v1/models
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
| Path | Description |
|---|---|
~/.config/localai/<INSTANCE>/models | Model files |
Models persist across container restarts. Each instance has isolated storage.
Place model files (GGUF, GGML) in the models directory:
# Copy a model
cp my-model.gguf ~/.config/localai/1/models/
# Or download directly
curl -L -o ~/.config/localai/1/models/model.gguf \
https://huggingface.co/.../model.gguf
# 1. Configure LocalAI (auto-detects GPU)
ujust localai config
# 2. Start the service
ujust localai start
# 3. Check the API
ujust localai url
# Output: http://127.0.0.1:8080
# 4. Test the API
curl http://localhost:8080/v1/models
OpenWebUI can use LocalAI as an OpenAI-compatible backend:
# Start LocalAI
ujust localai start
# In OpenWebUI settings, add connection:
# URL: http://localai:8080/v1 (cross-pod DNS)
# Or: http://host.containers.internal:8080/v1 (from host)
# Configure for network access
ujust localai config BIND=0.0.0.0
# Start the service
ujust localai start
# Or use Tailscale for secure access
ujust tailscale serve localai
GPU is automatically detected and the appropriate image is selected:
| GPU Type | Detection | Device Passthrough |
|---|---|---|
| NVIDIA | nvidia-smi | CDI (nvidia.com/gpu=all) |
| AMD | lspci | /dev/dri + /dev/kfd |
| Intel | lspci | /dev/dri |
# NVIDIA
ujust localai shell CMD="nvidia-smi"
# Check GPU environment
ujust localai shell CMD="env | grep -i gpu"
# Check status
ujust localai status
# View logs
ujust localai logs LINES=100
# Check image was pulled
podman images | grep localai
Common causes:
NVIDIA:
# Check CDI configuration
nvidia-ctk cdi list
# Regenerate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
AMD:
# Check /dev/kfd exists
ls -la /dev/kfd
# Check ROCm
rocminfo
# Test API endpoint
curl http://localhost:8080/v1/models
# Check logs for errors
ujust localai logs LINES=100
# Delete everything
ujust localai delete INSTANCE=all
# Reconfigure
ujust localai config
ujust localai start
ollama (simpler model management, different API)openwebui (can use LocalAI as backend)Use when the user asks about: