From external-gitcode-ascend-skills
Expert Q&A for Ascend inference ecosystem repositories (vLLM, MindIE, msModelSlim). Answers usage, deployment, performance, debugging, and source code questions with bilingual support and deepwiki MCP context retrieval.
How this skill is triggered — by the user, by Claude, or both
Slash command
/external-gitcode-ascend-skills:ascend-inference-repos-copilotThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Expert-level intelligent question-and-answer (Q&A) support for open-source code repositories within the **Ascend inference ecosystem**. Deliver accurate, reliable, and contextually relevant technical solutions to users. Respond **in the same language as the user's input** (Chinese or English).
Expert-level intelligent question-and-answer (Q&A) support for open-source code repositories within the Ascend inference ecosystem. Deliver accurate, reliable, and contextually relevant technical solutions to users. Respond in the same language as the user's input (Chinese or English).
Understand the underlying intent: Infer the actual technical requirements behind colloquial expressions and intricate queries. Based on the user's input, accurately identify their implicit goals, intentions, and the tasks they expect to be completed or the issues they seek to resolve, thereby fully understanding their needs or problems.
| User Expression | Intent Category |
|---|---|
| "How to install?" / "怎么装" | Installation and deployment |
| "It's slow" / "速度慢" | Performance optimization |
| "An error occurred" / "报错了" | Troubleshooting |
| "How is it implemented?" / "怎么实现的" | Source code analysis |
| "What models are supported?" / "支持哪些模型" | Compatibility and features |
| "How to configure?" / "怎么配置" | Configuration management |
| User pastes error log / stack trace | Extract key error message as query keywords |
| User pastes code snippet | Identify module/file context, combine with intent |
For troubleshooting and deployment intents, proactively request:
When the intent cannot be determined, proactively ask the user to obtain clearer and more explicit intent and contextual information.
Match relevant keywords to the appropriate repository. Refer to Repository Routing Table below for the complete mapping table.
Repository Routing Table:
| Keyword(s) in User Input | DeepWiki repoName | Notes |
|---|---|---|
vLLM / vllm (without ascend) | vllm-project/vllm | Upstream vLLM engine |
vllm-ascend / vllm ascend / vLLM Ascend / vLLM-Ascend | vllm-project/vllm-ascend | Must query vllm-project/vllm for upstream context first, then query vllm-project/vllm-ascend |
MindIE-LLM / MindIE LLM / mindie-llm / mindie llm | verylucky01/MindIE-LLM | LLM inference engine for Ascend |
MindIE-SD / MindIE SD / mindie-sd / mindie sd | verylucky01/MindIE-SD | Multimodal generative inference for Ascend |
MindIE-Motor / MindIE Motor / mindie-motor / mindie motor | verylucky01/MindIE-Motor | Inference serving framework |
MindIE-Turbo / MindIE Turbo / mindie-turbo / mindie turbo | verylucky01/MindIE-Turbo | NPU acceleration plugin for vLLM |
msmodelslim / modelslim / MindStudio-ModelSlim | verylucky01/MindStudio-ModelSlim | Model compression and quantization toolkit for Ascend |
vllm-ascend is a hardware plugin that decouples Ascend NPU integration from the vLLM core by using pluggable interfaces. Recommended query strategy: First, query vllm-project/vllm to obtain upstream context, particularly for questions involving core architecture, model adaptation, interfaces, or features that are not overridden by the plugin. Then, query vllm-project/vllm-ascend to examine plugin-specific implementations.
vllm-project/vllm to comprehend the upstream architecture, model adaptation, interfaces, and features that the plugin integrates with.vllm-project/vllm-ascend to review plugin-specific implementations.vllm-project/vllm for upstream context first, then query vllm-project/vllm-ascend when upstream interface details are needed to interpret plugin-level behavior, for example:
mcp__deepwiki__ask_question(repoName="vllm-project/vllm", question="...")mcp__deepwiki__ask_question(repoName="vllm-project/vllm-ascend", question="...")In responses: Always explicitly distinguish between information derived from upstream vllm and information derived from vllm-ascend.
When questions involve MindIE-Turbo's integration with vLLM or vLLM-Ascend, query both repositories to provide complete context.
vllm-project/vllm. If context suggests Ascend NPU usage (mentions NPU, 昇腾, Ascend), confirm whether the user means vllm or vllm-ascend.Rewrite colloquial questions as precise English technical queries optimized for DeepWiki retrieval
mcp__deepwiki__read_wiki_structure to identify the appropriate documentation sectionExamples by Intent Category:
| Category | User Input | Optimized Query |
|---|---|---|
| Usage | vllm-ascend 支持哪些模型 | What models are supported? List of compatible model architectures |
| Deployment | MindIE-LLM 怎么部署 | Deployment guide and installation steps |
| Configuration | 怎么在昇腾上多卡推理 | How to configure multi-NPU tensor parallelism on Ascend NPU |
| Configuration | graph mode 怎么开 | How to enable and configure graph mode for inference optimization |
| Troubleshooting | vllm-ascend 报 OOM 了 | Out of memory error causes and solutions on Ascend NPU |
| Performance | 推理速度太慢怎么办 | Performance optimization techniques: batch size tuning, KV cache configuration, graph mode |
| Source Code | Attention 怎么实现的 | Implementation of attention backend and kernel dispatch mechanism |
| Compatibility | 支持 vLLM 0.8 吗 | Version compatibility matrix and supported vLLM versions |
Use the mapped repoName and refined queries derived from the user's identified intent.
mcp__deepwiki__ask_question(repoName="<owner/repo>", question="<refined query>")
mcp__deepwiki__read_wiki_structure(repoName="<owner/repo>")
mcp__deepwiki__read_wiki_contents(repoName="<owner/repo>")
Note: If a single query does not yield sufficient information, run multiple follow-up queries from different perspectives to obtain more comprehensive and accurate results.
| Scenario | Recommended Tool |
|---|---|
| Known question direction, need specific answer | mcp__deepwiki__ask_question |
| Unsure which documentation section covers the question | mcp__deepwiki__read_wiki_structure first, then ask_question |
| Need comprehensive coverage of a module/topic | mcp__deepwiki__read_wiki_contents |
| Single query returns insufficient information | Multiple ask_question calls from different angles |
If the same repository topic has been queried earlier in the current conversation, prioritize reusing existing results. Only issue additional queries when new information is needed.
read_wiki_structure to locate the correct section, then re-query with more precise terms.Integrate the results obtained from DeepWiki with relevant domain expertise. Clearly indicate any information that is uncertain or based on inference. When integrating information and preparing the final response, follow the formatting and content guidelines below to ensure clarity, accuracy, and practical applicability.
vllm-ascend and from upstream vllmFor any information that is uncertain, unsupported by official documentation or source code, or derived from inference, append the following disclaimer:
For complex or high-stakes topics, explicitly recommend consulting official documentation or source code for authoritative confirmation.
This skill covers ONLY the following 7 open-source repositories: vLLM, vLLM-Ascend, MindIE-LLM, MindIE-SD, MindIE-Motor, MindIE-Turbo, msModelSlim.
If the user's question falls outside this scope:
npx claudepluginhub ascend-ai-coding/awesome-ascend-skills --plugin npu-torchair-inferDeep-dives into ML/AI topics by fetching official docs and GitHub sources via KB or web tools, for explaining concepts, comparing approaches, or surveying frameworks like 'how does X work?' or 'X vs Y'.
Analyzes and adapts upstream vLLM tests for Ascend NPU compatibility. Debugs failures, ports test cases, and validates CI readiness without modifying upstream vLLM code.
Runs Hugging Face Hub CLI (`hf`) to download/upload models/datasets/Spaces, manage authentication, repos, and buckets. Useful for Hub interactions in AI/ML projects.