Help us improve
Share bugs, ideas, or general feedback.
From systems-design
Provides LLM serving optimization recommendations for latency, inference costs, and throughput. Scans configs, detects stacks like vLLM/TGI, suggests quantization, batching, KV cache, and framework changes.
npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-designHow this skill is triggered — by the user, by Claude, or both
Slash command
/systems-design:optimize-llmThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Get quick, actionable recommendations for LLM serving optimization.
Provides patterns for LLM inference infrastructure with serving frameworks like vLLM, TGI, TensorRT-LLM; quantization, batching strategies, KV cache, and streaming responses. Use for optimizing latency and scaling deployments.
Compares LLM serving frameworks (SGLang, vLLM, TensorRT-LLM) to find optimal deployment commands under given workload, GPU budget, and latency SLA.
Interactive benchmark orchestrator for vLLM inference services. Runs single/multi-case online benchmarks, aggregates results, and auto-optimizes concurrency under latency SLOs.
Share bugs, ideas, or general feedback.
Get quick, actionable recommendations for LLM serving optimization.
/sd:optimize-llm [focus]
focus (optional): Optimization priority
latency - Focus on reducing response timecost - Focus on reducing inference coststhroughput - Focus on maximizing requests/second/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost
Gather Context
Spawn LLM Optimization Advisor Agent
Use the llm-optimization-advisor agent to analyze and provide recommendations. The agent specializes in:
Present Recommendations Display optimization opportunities organized by:
## LLM Optimization Report
### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]
### Quick Wins
1. [Optimization] - [Expected impact]
2. ...
### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase