npx claudepluginhub flagos-ai/skills --plugin flagos-skillsThis skill is limited to using the following tools:
End-to-end LLM deployment + testing pipeline for multi-chip GPU backends.
Installs vLLM, FlagTree, FlagGems, FlagCX, and vllm-plugin-FL stack in GPU Docker containers. Handles network mirrors, dependency ordering, wheel selection, and per-package validation after PyTorch setup.
Deploys vLLM inference server using Docker (pre-built images or build-from-source) with NVIDIA GPU support and OpenAI-compatible API.
Automates Ollama installation, hardware-based model selection, GPU setup, and client integration (Python/Node.js/REST) for local LLM inference on macOS/Linux/Docker.
Share bugs, ideas, or general feedback.
End-to-end LLM deployment + testing pipeline for multi-chip GPU backends. Orchestrates 4 sub-skills in sequence and produces a final report.
flagrelease/
├── SKILL.md # This file — orchestration flow
└── references/
└── pipeline-state.md # Pipeline state schema, gate logic, data flow
Sub-skills (each independently invokable):
../install-stack/ # Step 2: Install 5 packages
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── detect_network.py # Probe GitHub/PyPI, return mirror config
│ │ ├── collect_env_info.py # Python/glibc/arch/vendor/disk info
│ │ ├── select_flagtree_wheel.py # Match vendor+python+glibc → wheel
│ │ └── validate_packages.py # Import-test all 5 packages
│ └── references/
│ ├── vendor-mappings.md # FlagCX make flags, adaptor names
│ └── network-mirrors.md # Mirror config rules
../env-verify/ # Step 3: Qwen3-0.6B smoke test
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── run_offline_inference.py # Phase A: offline inference test
│ │ └── test_serve_mode.py # Phase B: serve + health + chat test
│ └── references/
│ └── error-classification.md # Layer-based error classification
../model-verify/ # Step 4: Target model ± multi-chip
│ ├── SKILL.md
│ ├── scripts/
│ │ └── diff_analysis.py # Compare Run A vs Run B results
│ └── references/
│ └── multichip-errors.md # Multi-chip error patterns
../perf-test/ # Steps 5+6: Accuracy + Performance
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── run_benchmark.py # Run single benchmark profile
│ │ └── run_all_benchmarks.py # Run all profiles + summarize
│ └── references/
│ └── benchmark-profiles.md # Profile definitions and metrics
[Prerequisite: /gpu-container-setup already done by another team]
│
▼
install-stack → Install 5 packages (vLLM, FlagTree, FlagGems, FlagCX, plugin)
│ scripts: detect_network, collect_env_info, select_flagtree_wheel
│
│ GATE: vLLM + plugin must succeed
▼
env-verify → Smoke test with Qwen3-0.6B (FlagGems/CX OFF)
│ scripts: run_offline_inference, test_serve_mode
│
│ Verify Layers 0-3
▼
model-verify → Target model test (OFF then ON), diff analysis
│ scripts: run_offline_inference, test_serve_mode, diff_analysis
│
│ Determine which stack works (full vs base)
▼
perf-test → Accuracy (placeholder) + Performance benchmarks
│ scripts: run_benchmark, run_all_benchmarks
▼
Final Report
A running Docker container with:
flagrelease-worker)This container is produced by /gpu-container-setup (maintained by another team).
Read references/pipeline-state.md for the full state schema and gate logic.
Ask user for container name (or detect running containers):
docker ps --format '{{.Names}}' | head -10
Verify the container is running:
docker inspect --format='{{.State.Status}}' <CONTAINER> | grep -q running
Initialize pipeline state (see references/pipeline-state.md).
Read and follow ../install-stack/SKILL.md.
The install-stack skill will:
scripts/collect_env_info.py into container → get vendor, Python, glibcscripts/detect_network.py into container → get mirror configscripts/select_flagtree_wheel.py for FlagTreescripts/validate_packages.py inside container → get final statusGate check: If gate_passed is false (vLLM or plugin failed) → STOP pipeline.
Report FAIL with install errors.
Store result in pipeline state.
Read and follow ../env-verify/SKILL.md.
The env-verify skill will:
scripts/run_offline_inference.py into container → Phase Ascripts/test_serve_mode.py into container → Phase Breferences/error-classification.mdDecision: Fatal error → STOP. Non-fatal → record and continue.
Store result in pipeline state.
Read and follow ../model-verify/SKILL.md.
This step is interactive — will ask user for model path.
The model-verify skill will:
run_offline_inference.py and test_serve_mode.py for Run A and Run Bscripts/diff_analysis.py to compare resultsrecommended_stack (full/base/none)Decision: If recommended_stack is none (Run A failed) → STOP.
Store result in pipeline state (including model_path, tp_size, recommended_stack).
Read and follow ../perf-test/SKILL.md.
The perf-test skill will:
scripts/run_all_benchmarks.py into container → run 5 profilesStore result in pipeline state.
Compile all results from pipeline state into a final report:
{
"status": "PASS | PARTIAL | FAIL",
"pipeline": "flagrelease",
"container": "<name>",
"vendor": "<vendor>",
"model": "<path>",
"tensor_parallel_size": 8,
"steps": {
"install_stack": { "status": "...", "packages": {...} },
"env_verify": { "status": "...", "phase_a": "...", "phase_b": "..." },
"model_verify": { "status": "...", "run_a": "...", "run_b": "...", "recommended_stack": "..." },
"perf_test": { "status": "...", "profiles_passed": "5/5", "summary_table": "..." }
},
"errors": [...],
"conclusion": "Pipeline completed. ..."
}
Present to user with clear summary:
Overall status:
PASS — all steps pass, full multi-chip stack worksPARTIAL — model works with degraded stack, or some perf profiles failedFAIL — model cannot serve (gate or Run A failure)