By flagos-ai
Orchestrate end-to-end LLM inference pipelines on multi-chip GPUs and NPUs using FlagOS agent skills: automate stack installation, environment verification, model migration, kernel generation and optimization, performance benchmarking, and structured reporting.
npx claudepluginhub flagos-ai/skills --plugin flagos-skillsAutomatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU functionality. Supports NVIDIA, Ascend, Metax, Iluvatar, and AMD/ROCm. Use when user says "setup container", "start pytorch container", or invokes /gpu-container-setup.
Full FlagRelease pipeline orchestrator. Runs the complete LLM deployment, verification, and benchmarking pipeline for multi-chip GPU backends. Executes: install-stack → env-verify → model-verify → perf-test in sequence, passing state between steps and producing a final structured report. Assumes gpu-container-setup (Step 1) is already done — a running container with PyTorch + GPU access must exist.
Install the 5-package multi-chip software stack (vLLM, FlagTree, FlagGems, FlagCX, vllm-plugin-FL) inside a GPU container. Handles network mirror detection, dependency ordering, wheel selection, and per-package validation. Use after gpu-container-setup has produced a running container with PyTorch + GPU access.
Unified GPU kernel operator generation and optimization skill. Automatically detects the target repository type (FlagGems, vLLM, or general Python/Triton) and dispatches to the appropriate specialized sub-skill. Includes operator generation, MCP-based iterative optimization, and feedback submission sub-skills. Use this skill when the user wants to generate or optimize a GPU kernel operator, create a Triton kernel, or says things like "generate an operator", "create a kernel for X", "optimize triton kernel", or "/kernelgen-flagos".
Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to add support for a new model to vllm-plugin-FL, port model code from upstream vLLM, or backport a newly released model. Trigger when the user says things like "migrate X model", "add X model support", "port X from upstream vLLM", "make X work with the FL plugin", or simply "/model-migrate-flagos model_name". The model_name argument uses snake_case (e.g. qwen3_5, kimi_k25, deepseek_v4). Do NOT use for models already supported by vLLM 0.13.0 core, or for multimodal-only components that don't need backporting.
Verify the serving stack with a user-specified target model. Runs twice: first with FlagGems/FlagCX disabled (isolate model-specific errors), then with full multi-chip stack enabled. Diffs the two runs to pinpoint which layer caused any failure.
Run accuracy benchmarks (FlagEval, when available) and performance benchmarks (vllm bench serve) against a served model. Covers 5 workload profiles: short/long prefill x short/long decode + high concurrency. Collects throughput, latency, TTFT, TPOT metrics.
Create new skills, modify existing skills, and validate skill quality for the FlagOS skills repository. Use this skill whenever someone wants to create a skill from scratch, improve or edit an existing skill, scaffold a new skill directory, validate skill structure, or run test cases against a skill. Trigger when the user says things like "create a skill", "make a new skill for X", "scaffold a skill", "improve this skill", "validate my skill", or simply "/skill-creator-flagos". Also trigger when users mention turning a workflow into a reusable skill, or want to package a repeated process as a skill.
Self-contained orchestration skill for writing high-performance TLE kernels and shipping TLE feature changes with reproducible validation. Use when the user wants to write/optimize TLE kernels, implement TLE API/verifier/lowering features, or debug TLE correctness/performance issues. Trigger on phrases like "write a TLE kernel", "optimize TLE operator", and "debug TLE local_ptr".
Install and configure vLLM-Plugin-FL for multiple hardware backends including NVIDIA, Ascend and etc. Use when setting up vllm-plugin-fl, configuring the environment for specific hardware backend, installing dependencies, checking whether dependencies are installed successfully, resolving runtime issues, and launching inference to verify successful model serving. Trigger when the user says things like "setup vllm-plugin-fl", "install vllm-plugin-fl", "configure FL plugin", "set up FlagGems", or "set up FlagCX".
Run AI models locally with Ollama - free alternative to OpenAI, Anthropic, and other paid LLM APIs. Zero-cost, privacy-first AI infrastructure.
When setting up local LLM inference without cloud APIs. When running GGUF models locally. When needing OpenAI-compatible API from a local model. When building offline/air-gapped AI tools. When troubleshooting local LLM server connections.
A real-time directory of AI models that allows your AI agent to advise and pick the ideal LLM for the user's task.
ML engineering plugin: Give your AI coding agent ML engineering superpowers.
🤖 AI Engineer — AI Engineer + LLM Systems Specialist
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Monorepo skill generator that creates specialized AI skills and checklists for each subsystem to improve code quality and consistency.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim