Plugin

llama-tune

Name: llama-tune
Author: kengbailey

Tune llama-server for optimal performance and GPU utilization. Analyzes GPU VRAM, model architecture (dense/MoE), and generates launch commands for maximum tok/s.

npx claudepluginhub kengbailey/bailey-marketplace --plugin llama-tune

Component Overview

Skills

Component Details

Skills (1)

llama-tune

/llama-tune

Tune llama-server for optimal performance and GPU utilization. Analyzes GPU VRAM, model architecture (dense/MoE), calculates VRAM budget, and generates launch command for maximum tok/s.

README

Bailey Marketplace

Personal Claude Code plugin marketplace.

Installation

Add this marketplace to Claude Code:

/plugin marketplace add <owner>/bailey-claude-marketplace

Then install individual plugins:

/plugin install <plugin-name>@bailey-marketplace

Available Plugins

Plugin	Description	Source
`claude-mem`	Persistent memory system for Claude Code. Captures tool usage, compresses observations with AI, and re-injects relevant context into future sessions.	External (thedotmack/claude-mem)
`llama-tune`	Tune llama-server for optimal performance and GPU utilization. Supports dense and MoE models.	In-repo

Plugin Notes

claude-mem

Persistent memory across Claude Code sessions. Automatically captures everything Claude does, compresses it with AI, and provides continuity in future sessions.

Auto-installed dependencies (installed on first run):

Bun >= 1.0.0
uv (Python package manager, for Chroma)
SQLite 3

Runtime:

Runs a worker service on localhost:37777
Web viewer UI available at http://localhost:37777
Stores data in ~/.claude-mem/

Install:

/plugin install claude-mem@bailey-marketplace

llama-tune

Tunes llama-server (llama.cpp) launch parameters for maximum tok/s on your hardware. Auto-detects GPU VRAM, CPU cores, and system RAM. Inspects GGUF model files to determine architecture (dense vs MoE), then calculates optimal flags including KV cache quantization, flash attention, expert offloading (MoE), and partial GPU layer placement.

Features:

Auto-detects system hardware (GPU, CPU, RAM)
Inspects GGUF model architecture via llama-gguf
Supports both dense and Mixture-of-Experts models
Calculates partial MoE expert offloading for maximum VRAM utilization
Generates ready-to-run llama-server commands
Includes reference database of GPU specs, model architectures, and quantization levels

Skill: /llama-tune <model.gguf> [--ctx SIZE] [--slots N] [--port PORT] [--launch]

Install:

/plugin install llama-tune@bailey-marketplace

Adding Plugins

In-repo plugins go in the plugins/ directory. External plugins are referenced by source in .claude-plugin/marketplace.json.

License

MIT

Similar Plugins

llamafile

When setting up local LLM inference without cloud APIs. When running GGUF models locally. When needing OpenAI-compatible API from a local model. When building offline/air-gapped AI tools. When troubleshooting local LLM server connections.

Stats

Version1.0.0

Parent Repo Stars1

MaintenanceGood

LicenseMIT

AddedApr 3, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Available In

bailey-marketplace1

Help us improve

Share bugs, ideas, or general feedback.

Back to Plugins

Bailey Marketplace

Personal Claude Code plugin marketplace.

Installation

Add this marketplace to Claude Code:

/plugin marketplace add <owner>/bailey-claude-marketplace

Then install individual plugins:

/plugin install <plugin-name>@bailey-marketplace

Available Plugins

Plugin	Description	Source
`claude-mem`	Persistent memory system for Claude Code. Captures tool usage, compresses observations with AI, and re-injects relevant context into future sessions.	External (thedotmack/claude-mem)
`llama-tune`	Tune llama-server for optimal performance and GPU utilization. Supports dense and MoE models.	In-repo

Plugin Notes

claude-mem

Persistent memory across Claude Code sessions. Automatically captures everything Claude does, compresses it with AI, and provides continuity in future sessions.

Auto-installed dependencies (installed on first run):

Bun >= 1.0.0
uv (Python package manager, for Chroma)
SQLite 3

Runtime:

Runs a worker service on localhost:37777
Web viewer UI available at http://localhost:37777
Stores data in ~/.claude-mem/

Install:

/plugin install claude-mem@bailey-marketplace

llama-tune

Features:

Auto-detects system hardware (GPU, CPU, RAM)
Inspects GGUF model architecture via llama-gguf
Supports both dense and Mixture-of-Experts models
Calculates partial MoE expert offloading for maximum VRAM utilization
Generates ready-to-run llama-server commands
Includes reference database of GPU specs, model architectures, and quantization levels

Skill: /llama-tune <model.gguf> [--ctx SIZE] [--slots N] [--port PORT] [--launch]

Install:

/plugin install llama-tune@bailey-marketplace

Adding Plugins

In-repo plugins go in the plugins/ directory. External plugins are referenced by source in .claude-plugin/marketplace.json.

License

MIT

llama-tune

Component Overview

Component Details

Skills (1)

README

Bailey Marketplace

Installation

Available Plugins

Plugin Notes

claude-mem

llama-tune

Adding Plugins

License

Similar Plugins

llamafile

Help us improve

Help us improve

llama-tune

Component Overview

Component Details

Skills (1)

README

Bailey Marketplace

Installation

Available Plugins

Plugin Notes

claude-mem

llama-tune

Adding Plugins

License

Similar Plugins

llamafile

Help us improve

ollama-local-ai

itsmostafa-llm-engineering-skills

llm-router

caveman

ui-design