Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By vllm-project
Deploy vLLM OpenAI-compatible inference servers locally with hardware detection, via Docker images, or Kubernetes YAML manifests with GPU support, then benchmark throughput, TTFT, TPOT, inter-token latency, and prefix caching using synthetic data, ShareGPT, or fixed prompts.
npx claudepluginhub vllm-project/vllm-skills --plugin vllm-skillsRun vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.
Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.
Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server.
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Agent-ready playbooks for LLM serving benchmarks, capacity planning, torch-profiler triage, pipeline analysis, compute simulation, SGLang/vLLM SOTA Humanize loops, human code review, production incident triage, and model PR-history dossiers.
Run AI models locally with Ollama - free alternative to OpenAI, Anthropic, and other paid LLM APIs. Zero-cost, privacy-first AI infrastructure.
Agent Skills for NeMo Evaluator SDK
Deploy ML models with FastAPI, Docker, Kubernetes. Use for serving predictions, containerization, monitoring, drift detection, or encountering latency issues, health check failures, version conflicts.
AI-assisted inference on NVIDIA DGX Spark - run, manage, and stop LLM workloads
SkyPilot agent skill for launching cloud VMs, Kubernetes pods, and Slurm jobs across 25+ clouds
A collection of skills for deploying and benchmarking vLLM. This project follows the anthropics/skills template format and is installable as a Claude Code plugin marketplace.
This repository provides modular, reusable agent skills required to operate and benchmark vLLM, following the Anthropics SKILL.md specification. Each skill is a self-contained directory implementing automation, scripts, and metadata for a specific operational task.
| Skill | Description |
|---|---|
| vllm-deploy-docker | Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server. |
| vllm-deploy-k8s | Deploy vLLM to Kubernetes with GPU support, health probes, and OpenAI-compatible API endpoint. |
| vllm-deploy-simple | Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API. |
| vllm-prefix-cache-bench | Benchmark the efficiency of vLLM automatic prefix caching using fixed prompts, real datasets, or synthetic prefix/suffix patterns. |
| vllm-bench-random-synthetic | Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT, TPOT, and other key performance metrics without downloading external datasets. |
| vllm-bench-serve | Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. |
Install directly from the plugin marketplace in Claude Code:
/plugin marketplace add vllm-project/vllm-skills
/plugin install vllm-skills@vllm-skills
Clone the repository and copy skills to your Claude Code skills directory:
git clone https://github.com/vllm-project/vllm-skills.git
cd vllm-skills
Copy to global skill folder:
cp -r plugins/vllm-skills/skills/vllm-deploy-simple ~/.claude/skills/
Or copy to the project skill folder:
cp -r plugins/vllm-skills/skills/vllm-deploy-simple .claude/skills/
Once installed, use the skills with slash commands or natural language:
/vllm-deploy-simple
Deploy vLLM with Qwen2.5-1.5B-Instruct on port 8000
Install and start a vLLM server using the vllm-deploy-simple skill
See vLLM documentation for the full list.
This project follows the anthropics/skills template. When adding new skills:
plugins/vllm-skills/skills/ (e.g., plugins/vllm-skills/skills/your-skill/)SKILL.md file with YAML frontmatter:
---
name: your-skill
description: Brief description of what this skill does
---
scripts/, references/, and assets/ directoriesLicensed under the Apache License 2.0. See LICENSE.