Skill
optimize-llm
Get LLM optimization recommendations for serving latency, inference costs, and throughput improvements
From systems-designInstall
1
Run in your terminal$
npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-designTool Access
This skill is limited to using the following tools:
ReadGlobGrepTask
Skill Content
Optimize LLM Command
Get quick, actionable recommendations for LLM serving optimization.
Usage
/sd:optimize-llm [focus]
Arguments
focus(optional): Optimization prioritylatency- Focus on reducing response timecost- Focus on reducing inference coststhroughput- Focus on maximizing requests/second- If omitted: Provide balanced recommendations
Examples
/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost
Workflow
-
Gather Context
- Search for LLM-related configuration files
- Look for: model configs, serving configs, inference scripts
- Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)
-
Spawn LLM Optimization Advisor Agent Use the
llm-optimization-advisoragent to analyze and provide recommendations. The agent specializes in:- Quantization strategies (INT8, INT4, FP16)
- Batching optimization (continuous, dynamic)
- KV cache optimization (PagedAttention)
- Serving framework selection
- Cost reduction strategies
-
Present Recommendations Display optimization opportunities organized by:
- Quick Wins - Low effort, high impact changes
- Medium Effort - Moderate changes with significant benefits
- Advanced - Architectural changes for maximum performance
Output Format
## LLM Optimization Report
### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]
### Quick Wins
1. [Optimization] - [Expected impact]
2. ...
### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...
### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase
Similar Skills
Stats
Parent Repo Stars40
Parent Repo Forks6
Last CommitFeb 15, 2026