Skill

optimize-llm

Provides LLM serving optimization recommendations for latency, inference costs, and throughput. Scans configs, detects stacks like vLLM/TGI, suggests quantization, batching, KV cache, and framework changes.

ai-ml

performance

npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-design

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/systems-design:optimize-llm

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGlobGrepTask

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Get quick, actionable recommendations for LLM serving optimization.

SKILL.md

82 lines · ~515 tokens

Similar Skills

llm-serving-patterns

Provides patterns for LLM inference infrastructure with serving frameworks like vLLM, TGI, TensorRT-LLM; quantization, batching strategies, KV cache, and streaming responses. Use for optimizing latency and scaling deployments.

3 tools

systems-design

llm-serving-auto-benchmark

479

Compares LLM serving frameworks (SGLang, vLLM, TensorRT-LLM) to find optimal deployment commands under given workload, GPU budget, and latency SLA.

20 files

ai-infra-auto-driven-skills

vllm-bench-serve

Interactive benchmark orchestrator for vLLM inference services. Runs single/multi-case online benchmarks, aggregates results, and auto-optimizes concurrency under latency SLOs.

17 files

external-gitcode-ascend-skills

Stats

LanguagePython

Parent stars67

Parent forks10

MaintenanceGood

Last CommitFeb 15, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Optimize LLM Command

Get quick, actionable recommendations for LLM serving optimization.

Usage

/sd:optimize-llm [focus]

Arguments

focus (optional): Optimization priority

latency - Focus on reducing response time
cost - Focus on reducing inference costs
throughput - Focus on maximizing requests/second
If omitted: Provide balanced recommendations

Examples

/sd:optimize-llm /sd:optimize-llm latency /sd:optimize-llm cost

Workflow

Gather Context

Search for LLM-related configuration files
Look for: model configs, serving configs, inference scripts
Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)

Spawn LLM Optimization Advisor Agent Use the llm-optimization-advisor agent to analyze and provide recommendations. The agent specializes in:

Quantization strategies (INT8, INT4, FP16)
Batching optimization (continuous, dynamic)
KV cache optimization (PagedAttention)
Serving framework selection
Cost reduction strategies

Present Recommendations Display optimization opportunities organized by:

Quick Wins - Low effort, high impact changes
Medium Effort - Moderate changes with significant benefits
Advanced - Architectural changes for maximum performance

Output Format

## LLM Optimization Report ### Current Setup - Model: [detected or ask] - Framework: [detected or unknown] - Hardware: [detected or ask] ### Quick Wins 1. [Optimization] - [Expected impact] 2. ... ### Medium Effort Optimizations 1. [Optimization] - [Expected impact] 2. ... ### Advanced Optimizations 1. [Optimization] - [Expected impact] 2. ... ### Estimated Total Impact - Latency: [X]% improvement - Cost: [X]% reduction - Throughput: [X]x increase

Optimize LLM Command

Get quick, actionable recommendations for LLM serving optimization.

Usage

/sd:optimize-llm [focus]

Arguments

focus (optional): Optimization priority
- latency - Focus on reducing response time
- cost - Focus on reducing inference costs
- throughput - Focus on maximizing requests/second
- If omitted: Provide balanced recommendations

Examples

/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost

Workflow

Gather Context
- Search for LLM-related configuration files
- Look for: model configs, serving configs, inference scripts
- Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)
Spawn LLM Optimization Advisor Agent Use the llm-optimization-advisor agent to analyze and provide recommendations. The agent specializes in:
- Quantization strategies (INT8, INT4, FP16)
- Batching optimization (continuous, dynamic)
- KV cache optimization (PagedAttention)
- Serving framework selection
- Cost reduction strategies
Present Recommendations Display optimization opportunities organized by:
- Quick Wins - Low effort, high impact changes
- Medium Effort - Moderate changes with significant benefits
- Advanced - Architectural changes for maximum performance

Output Format

## LLM Optimization Report

### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]

### Quick Wins
1. [Optimization] - [Expected impact]
2. ...

### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase

optimize-llm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

optimize-llm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Optimize LLM Command

Usage

Arguments

Examples

Workflow

Output Format

Similar Skills

Help us improve

Optimize LLM Command

Usage

Arguments

Examples

Workflow

Output Format