Marketplace

bailey-marketplace

Personal Claude Code plugin marketplace

npx claudepluginhub kengbailey/bailey-marketplace

README

2 Plugins

claude-mem

72.1k

194

Memory compression system for Claude Code - persist context across sessions

today

v12.6.2

llama-tune

Tune llama-server for optimal performance and GPU utilization. Analyzes GPU VRAM, model architecture (dense/MoE), and generates launch commands for maximum tok/s.

v1.0.0

Related Marketplaces

nextjs

139.3K

0plugins

No description available.

thedotmack

72.2K

0plugins

Plugins by Alex Newman (thedotmack)

caveman

51.4K

0plugins

Ultra-compressed communication mode for Claude Code. Cuts ~75% of tokens while keeping full technical accuracy.

Stats

Plugins2

Stars72109

UpdatedApr 3, 2026

Links

View on GitHub View Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

Bailey Marketplace

Personal Claude Code plugin marketplace.

Installation

Add this marketplace to Claude Code:

/plugin marketplace add <owner>/bailey-claude-marketplace

Then install individual plugins:

/plugin install <plugin-name>@bailey-marketplace

Available Plugins

Plugin	Description	Source
`claude-mem`	Persistent memory system for Claude Code. Captures tool usage, compresses observations with AI, and re-injects relevant context into future sessions.	External (thedotmack/claude-mem)
`llama-tune`	Tune llama-server for optimal performance and GPU utilization. Supports dense and MoE models.	In-repo

Plugin Notes

claude-mem

Persistent memory across Claude Code sessions. Automatically captures everything Claude does, compresses it with AI, and provides continuity in future sessions.

Auto-installed dependencies (installed on first run):

Bun >= 1.0.0
uv (Python package manager, for Chroma)
SQLite 3

Runtime:

Runs a worker service on localhost:37777
Web viewer UI available at http://localhost:37777
Stores data in ~/.claude-mem/

Install:

/plugin install claude-mem@bailey-marketplace

llama-tune

Tunes llama-server (llama.cpp) launch parameters for maximum tok/s on your hardware. Auto-detects GPU VRAM, CPU cores, and system RAM. Inspects GGUF model files to determine architecture (dense vs MoE), then calculates optimal flags including KV cache quantization, flash attention, expert offloading (MoE), and partial GPU layer placement.

Features:

Auto-detects system hardware (GPU, CPU, RAM)
Inspects GGUF model architecture via llama-gguf
Supports both dense and Mixture-of-Experts models
Calculates partial MoE expert offloading for maximum VRAM utilization
Generates ready-to-run llama-server commands
Includes reference database of GPU specs, model architectures, and quantization levels

Skill: /llama-tune <model.gguf> [--ctx SIZE] [--slots N] [--port PORT] [--launch]

Install:

/plugin install llama-tune@bailey-marketplace

Adding Plugins

In-repo plugins go in the plugins/ directory. External plugins are referenced by source in .claude-plugin/marketplace.json.

License

MIT