Production LLM inference. Includes vLLM (PagedAttention, continuous batching), TensorRT-LLM (NVIDIA optimization), llama.cpp (CPU/Apple Silicon), and SGLang (structured generation, RadixAttention). Use when deploying models for production inference.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install inference-serving@ai-research-skillsComprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
Interactive learning mode that requests meaningful code contributions at decision points (mimics the unshipped Learning output style)
Automated code review for pull requests using multiple specialized agents with confidence-based scoring
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification