Community Plugin

inference-serving

Production LLM inference. Includes vLLM (PagedAttention, continuous batching), TensorRT-LLM (NVIDIA optimization), llama.cpp (CPU/Apple Silicon), and SGLang (structured generation, RadixAttention). Use when deploying models for production inference.

1.0.0

Updated 1 month ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

LSP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install inference-serving@ai-research-skills

Component Details

No components detected in this plugin's metadata.

Stats

Stars746

Forks59

MaintenanceGood

Last Commit1 month ago

Collections

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

feature-dev

57.0k

Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review

3mo

v1.0.0

learning-output-style

57.0k

Interactive learning mode that requests meaningful code contributions at decision points (mimics the unshipped Learning output style)

2mo

v1.0.0

code-review

57.0k

Automated code review for pull requests using multiple specialized agents with confidence-based scoring

today

v1.0.0

pr-review-toolkit

57.0k

Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification

3mo

v1.0.0