From external-gitcode-ascend-skills
迁移 Boltz2 蛋白质结构预测模型至华为昇腾 NPU,指导环境搭建、源码适配、权重准备与端到端推理验证。
How this skill is triggered — by the user, by Claude, or both
Slash command
/external-gitcode-ascend-skills:boltz2The summary Claude sees in its skill listing — used to decide when to auto-load this skill
本 Skill 提供将 Boltz2从 CUDA 迁移到昇腾 NPU 的完整步骤。
本 Skill 提供将 Boltz2从 CUDA 迁移到昇腾 NPU 的完整步骤。
| 项目 | 要求 |
|---|---|
| NPU 芯片 | Ascend 910/910B/910C(至少 1 卡) |
| 驱动 | npu-smi info 正常 |
| CANN Toolkit | 8.2.RC1 |
| 操作系统 | Linux aarch64 |
| Python 环境 | conda/miniconda |
| 项目 | 版本/信息 |
|---|---|
| npu-smi | 25.0.rc1.3 |
| CANN Toolkit | 8.2.RC1 |
| CANN 路径 | /usr/local/Ascend/ascend-toolkit/set_env.sh |
| Python | 3.11.14 |
| torch | 2.5.1(与 CANN版本匹配) |
| torch_npu | 2.5.1(与 CANN版本匹配) |
| conda 环境 | Boltz-2 |
说明:CANN的默认按照路径为/usr/local/Ascend/ascend-toolkit/。
Boltz2 推理需要以下文件放在 ~/.boltz/:
| 文件名 | 用途 | 建议来源 |
|---|---|---|
boltz2_conf.ckpt | Boltz2 结构预测主权重 | boltz-community/boltz-2 |
boltz2_aff.ckpt | Boltz2 亲和力头权重 | boltz-community/boltz-2 |
boltz1_conf.ckpt | Boltz1 兼容权重(可选) | boltz-community/boltz-1 |
ccd.pkl | CCD 分子字典 | boltz-community/boltz-1 |
mols.tar | 分子数据包 | boltz-community/boltz-2 |
建议目录结构:
~/.boltz/
boltz1_conf.ckpt
boltz2_conf.ckpt
boltz2_aff.ckpt
ccd.pkl
mols.tar
检查命令:
ls -lh ~/.boltz
source /root/anaconda3/etc/profile.d/conda.sh
conda create -n Boltz-2 python=3.11 -y
conda activate Boltz-2
注:可先查看本机是否已安装miniconda或者anaconda,再创建虚拟环境。
pip install torch==2.5.1 torch_npu==2.5.1
pip install pyyaml numpy decorator cloudpickle ml-dtypes tornado
source /usr/local/Ascend/ascend-toolkit/set_env.sh
npu-smi info
cat /usr/local/Ascend/ascend-toolkit/latest/version.cfg | head -20
source /root/anaconda3/etc/profile.d/conda.sh
conda activate Boltz-2
python -c "import sys, torch, torch_npu; print(sys.version); print(torch.__version__, torch_npu.__version__)"
注:如果遇到torch_npu相关的报错,可能是CANN和torch_npu版本不匹配,可以通过cat /home/Ascend/ascend-toolkit/latest/version.cfg | head -20查看当前CANN版本,以及本地是否安装有其它和当前环境匹配的CANN。
transfer_to_npu,把 CUDA API 自动重映射到 NPU。npu_adapter 做设备无关封装,替换代码中的 CUDA 硬编码。npu accelerator。下载源码
cd /home/workspace
git clone https://github.com/jwohlwend/boltz.git
cd boltz
| 文件 | 作用 |
|---|---|
src/boltz/model/npu_adapter.py | NPU 兼容层(设备判断、autocast 设备类型、能力探测等) |
src/boltz/model/npu_accelerator.py | Lightning 的 npu accelerator 注册 |
src/boltz/model/layers/npu_kernels.py | triangle attention / triangular mult 的 NPU 等价实现 |
tests/test_npu_migration.py | 迁移回归测试 |
src/boltz/main.py:入口启用 torch_npu 与 transfer_to_npu在文件顶部加入:
try:
import torch_npu
from torch_npu.contrib import transfer_to_npu
except ImportError:
pass
作用:自动把常见 torch.cuda.* api映射到 NPU。
src/boltz/model/models/boltz2.py 和 src/boltz/model/models/boltz1.py:修复 major 属性问题在两个文件的 setup() 里,把原始 capability 判断:
if stage == "predict" and not (
torch.cuda.is_available()
and torch.cuda.get_device_properties(torch.device("cuda")).major >= 8.0
):
self.use_kernels = False
替换为:
if stage == "predict":
try:
has_capability = (
torch.cuda.is_available()
and torch.cuda.get_device_properties(torch.device("cuda")).major >= 8.0
)
except (AttributeError, RuntimeError):
has_capability = torch.cuda.is_available()
if not has_capability:
self.use_kernels = False
原因:NPU 设备属性没有 CUDA 的 major 字段。
src/boltz/model/layers/npu_kernels.py:提供 CUDA kernel 的 NPU 等价实现创建文件并写入:
import torch
import torch.nn.functional as F
def npu_triangle_attention(q, k, v, tri_bias, mask, scale):
q = q * scale
attn = torch.matmul(q, k.transpose(-1, -2))
if tri_bias.dim() > attn.dim():
tri_bias = tri_bias.squeeze(-4)
attn = attn + tri_bias
if mask is not None:
if mask.dim() < attn.dim():
mask = mask.unsqueeze(-3)
attn = attn.masked_fill(~mask, float("-inf"))
attn = F.softmax(attn, dim=-1)
return torch.matmul(attn, v)
def npu_triangle_multiplicative_update(
x,
direction,
mask,
norm_in_weight,
norm_in_bias,
p_in_weight,
g_in_weight,
norm_out_weight,
norm_out_bias,
p_out_weight,
g_out_weight,
eps=1e-5,
):
x_normed = F.layer_norm(x, [x.shape[-1]], norm_in_weight, norm_in_bias, eps)
proj = F.linear(x_normed, p_in_weight)
gate = F.linear(x_normed, g_in_weight).sigmoid()
x_gated = (proj * gate) * mask.unsqueeze(-1)
a, b = x_gated.float().chunk(2, dim=-1)
if direction == "outgoing":
x_tri = torch.einsum("bikd,bjkd->bijd", a, b)
else:
x_tri = torch.einsum("bkid,bkjd->bijd", a, b)
x_out = F.layer_norm(x_tri, [x_tri.shape[-1]], norm_out_weight, norm_out_bias, eps)
x_out = F.linear(x_out, p_out_weight)
return x_out * F.linear(x_normed, g_out_weight).sigmoid()
src/boltz/model/layers/triangular_mult.py:按设备分发 kernel在 kernel_triangular_mult 中改成如下结构(核心是 NPU 优先 + ImportError fallback):
@torch.compiler.disable
def kernel_triangular_mult(x, direction, mask, ..., eps):
if x.device.type == "npu":
from boltz.model.layers.npu_kernels import npu_triangle_multiplicative_update
return npu_triangle_multiplicative_update(x, ...)
try:
from cuequivariance_torch.primitives.triangle import triangle_multiplicative_update
except ImportError:
from boltz.model.layers.npu_kernels import npu_triangle_multiplicative_update
return npu_triangle_multiplicative_update(x, ...)
return triangle_multiplicative_update(x, ...)
src/boltz/model/layers/triangular_attention/primitives.py:按设备分发 attention kernel在 kernel_triangular_attn 中改成:
@torch.compiler.disable
def kernel_triangular_attn(q, k, v, tri_bias, mask, scale):
if q.device.type == "npu":
from boltz.model.layers.npu_kernels import npu_triangle_attention
return npu_triangle_attention(q, k, v, tri_bias=tri_bias, mask=mask, scale=scale)
try:
from cuequivariance_torch.primitives.triangle import triangle_attention
return triangle_attention(q, k, v, tri_bias, mask=mask, scale=scale)
except ImportError:
from boltz.model.layers.npu_kernels import npu_triangle_attention
return npu_triangle_attention(q, k, v, tri_bias=tri_bias, mask=mask, scale=scale)
src/boltz/model/npu_accelerator.py:注册 Lightning NPU accelerator创建文件并写入:
import torch
from pytorch_lightning.accelerators import AcceleratorRegistry
from pytorch_lightning.accelerators.accelerator import Accelerator
class NPUAccelerator(Accelerator):
def setup_device(self, device):
if device.type != "npu":
device = torch.device("npu", 0)
torch.npu.set_device(device)
def teardown(self):
torch.npu.empty_cache()
@staticmethod
def parse_devices(devices):
if isinstance(devices, int):
return list(range(devices))
if isinstance(devices, str):
return [int(d) for d in devices.split(",")]
return devices
@staticmethod
def get_parallel_devices(devices):
return [torch.device("npu", i) for i in devices]
@staticmethod
def auto_device_count():
return torch.npu.device_count()
@staticmethod
def is_available():
return torch.npu.device_count() > 0
def register_npu_accelerator():
if "npu" not in AcceleratorRegistry:
AcceleratorRegistry.register("npu", NPUAccelerator, description="Ascend NPU")
src/boltz/main.py:CLI 与 Trainer 适配--accelerator 选项加入 npu:type=click.Choice(["gpu", "cpu", "tpu", "npu"])
predict() 开头加入注册与自动探测:from boltz.model.npu_accelerator import register_npu_accelerator
register_npu_accelerator()
if accelerator == "gpu" and not torch.cuda.is_available():
try:
import torch_npu
if torch.npu.is_available():
accelerator = "npu"
except ImportError:
pass
if accelerator == "npu":
from pytorch_lightning.strategies import SingleDeviceStrategy
if isinstance(devices, int) and devices == 1:
strategy = SingleDeviceStrategy(device=torch.device("npu", 0))
precision_val = 32 if model == "boltz1" else "bf16-mixed"
plugins = []
if accelerator == "npu" and precision_val == "bf16-mixed":
from pytorch_lightning.plugins.precision import MixedPrecision
plugins.append(MixedPrecision(precision_val, device="npu"))
precision_val = None
trainer_kwargs = dict(
default_root_dir=out_dir,
strategy=strategy,
callbacks=[pred_writer],
accelerator=accelerator,
devices=devices,
)
if precision_val is not None:
trainer_kwargs["precision"] = precision_val
if plugins:
trainer_kwargs["plugins"] = plugins
trainer = Trainer(**trainer_kwargs)
cd /home/workspace/boltz
pip install -e .
source /usr/local/Ascend/ascend-toolkit/set_env.sh
conda activate Boltz-2
cd /home/workspace/boltz
boltz predict examples/prot_no_msa.yaml \
--devices 1 \
--out_dir ./boltz_output_e2e \
--num_workers 0
说明:--num_workers 0 是 NPU 上的安全配置,避免 DataLoader 多进程引发崩溃。
Auto-detected Ascend NPU(如果使用默认 accelerator)Number of failed examples: 0ls ./boltz_output_e2e/boltz_results_prot_no_msa/predictions/prot_no_msa/
应至少包含:
prot_no_msa_model_0.cifplddt_prot_no_msa_model_0.npzpae_prot_no_msa_model_0.npzpde_prot_no_msa_model_0.npzconfidence_prot_no_msa_model_0.jsonValueError: invalid accelerator name: npu未注册 Lightning NPU accelerator。检查 src/boltz/model/npu_accelerator.py 是否存在,
并确认 predict() 中执行了 register_npu_accelerator()。
ModuleNotFoundError: cuequivariance_torch属于 CUDA-only 依赖缺失。应走 NPU fallback 路径。
检查 triangular_mult.py 和 triangular_attention/primitives.py 是否已加入 NPU 分发逻辑。
推理命令加 --num_workers 0。
torch.cuda.* 相关属性报错说明仍有 CUDA 硬编码未迁移。用以下命令排查:
rg -n "torch\.cuda|autocast\(\"cuda\"\)" src/boltz
满足以下 4 条即视为迁移成功:
boltz predict 在 NPU 上可运行且 Number of failed examples: 0。python scripts/check_boltz2_assets.py --boltz-home ~/.boltz --check-runtimereferences/migration-checklist.mdnpx claudepluginhub ascend-ai-coding/awesome-ascend-skills --plugin npu-torchair-inferMigrates BoltzGen protein design and inverse folding workflows from CUDA to Huawei Ascend NPU. Covers environment setup, weight caching, cuEquivariance compatibility, source adaptation, and end-to-end inference.
Predicts biomolecular structures and binding affinity via hosted NVIDIA API or local Docker. Handles proteins, ligands, DNA/RNA, and mmCIF output.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.