By vllm-project
Deploy vLLM OpenAI-compatible inference servers locally with hardware detection, via Docker images, or Kubernetes YAML manifests with GPU support, then benchmark throughput, TTFT, TPOT, inter-token latency, and prefix caching using synthetic data, ShareGPT, or fixed prompts.
npx claudepluginhub vllm-project/vllm-skills --plugin vllm-skillsRun vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.
Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.
Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server.
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.
This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.
A collection of skills for deploying and benchmarking vLLM. This project follows the anthropics/skills template format and is installable as a Claude Code plugin marketplace.
This repository provides modular, reusable agent skills required to operate and benchmark vLLM, following the Anthropics SKILL.md specification. Each skill is a self-contained directory implementing automation, scripts, and metadata for a specific operational task.
| Skill | Description |
|---|---|
| vllm-deploy-docker | Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server. |
| vllm-deploy-k8s | Deploy vLLM to Kubernetes with GPU support, health probes, and OpenAI-compatible API endpoint. |
| vllm-deploy-simple | Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API. |
| vllm-prefix-cache-bench | Benchmark the efficiency of vLLM automatic prefix caching using fixed prompts, real datasets, or synthetic prefix/suffix patterns. |
| vllm-bench-random-synthetic | Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT, TPOT, and other key performance metrics without downloading external datasets. |
| vllm-bench-serve | Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. |
Install directly from the plugin marketplace in Claude Code:
/plugin marketplace add vllm-project/vllm-skills
/plugin install vllm-skills@vllm-skills
Clone the repository and copy skills to your Claude Code skills directory:
git clone https://github.com/vllm-project/vllm-skills.git
cd vllm-skills
Copy to global skill folder:
cp -r plugins/vllm-skills/skills/vllm-deploy-simple ~/.claude/skills/
Or copy to the project skill folder:
cp -r plugins/vllm-skills/skills/vllm-deploy-simple .claude/skills/
Once installed, use the skills with slash commands or natural language:
/vllm-deploy-simple
Deploy vLLM with Qwen2.5-1.5B-Instruct on port 8000
Install and start a vLLM server using the vllm-deploy-simple skill
See vLLM documentation for the full list.
This project follows the anthropics/skills template. When adding new skills:
plugins/vllm-skills/skills/ (e.g., plugins/vllm-skills/skills/your-skill/)SKILL.md file with YAML frontmatter:
---
name: your-skill
description: Brief description of what this skill does
---
scripts/, references/, and assets/ directoriesLicensed under the Apache License 2.0. See LICENSE.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Team-oriented workflow plugin with role agents, 27 specialist agents, ECC-inspired commands, layered rules, and hooks skeleton.
UI/UX design intelligence. 67 styles, 161 palettes, 57 font pairings, 25 charts, 15 stacks (React, Next.js, Vue, Svelte, Astro, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Nuxt, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
This skill should be used when users need to generate ideas, explore creative solutions, or systematically brainstorm approaches to problems. Use when users request help with ideation, content planning, product features, marketing campaigns, strategic planning, creative writing, or any task requiring structured idea generation. The skill provides 30+ research-validated prompt patterns across 14 categories with exact templates, success metrics, and domain-specific applications.