MSAdapter是一款针对MindSpore适配PyTorch接口的实用工具,旨在不改变原生PyTorch用户的编程使用习惯下,使得PyTorch风格代码能在昇腾硬件上获得高效性能。用户只需要将PyTorch源代码中import torch
替换为import msadapter.pytorch
,加上少量训练代码适配即可实现模型在昇腾硬件上的训练。
将现有PyTorch原生代码利用MSAdapter移植至MindSpore时,当前通常需要以下三个步骤:
Step1: 替换导入模块
# import torch
import msadapter.pytorch as torch
# import torchvision as vision
import msadapter.torchvision as vision
MSAdapter已经支持大部分PyTorch语法的原生态表达,和部分torchvision、torchaudio数据处理相关接口,模型中所使用的高阶APIs支持状态可以从这里找到 Supported List。如果有一些必要的接口和功能缺失可以通过ISSUE 向我们反馈,我们会优先支持。
Step2: 替换网络训练脚本
当前网络训练流程仍无法完全自动适配(自动适配特性开发中,敬请期待!),请根据以下示例进行适配修改;
PyTorch 原生训练表达:
import torch.nn as nn
net = model().to(config_args.device)
criterion = nn.CrossEntropyLoss()
optimizer = ms.nn.SGD(net.trainable_params(), learning_rate=0.01, momentum=0.9, weight_decay=0.0005)
net.train()
# 数据迭代训练
for i in range(epochs):
for X, y in train_data:
X, y = X.to(config_args.device), y.to(config_args.device)
out = net(X)
loss = criterion(out, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("------>epoch:{}, loss:{:.6f}".format(i, loss))
替换为Mindspore函数式迭代训练表达:
import mindspore as ms
import msadapter.pytorch.nn as nn
net = model().to(config_args.device)
criterion = nn.CrossEntropyLoss()
optimizer = ms.nn.SGD(net.trainable_params(), learning_rate=0.01, momentum=0.9, weight_decay=0.0005)
# 定义前向过程
def forward_fn(data, label):
logits = net(data)
loss = criterion(logits, label)
return loss, logits
# 反向梯度定义
grad_fn = ms.ops.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
# 单步训练定义
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
loss = ms.ops.depend(loss, optimizer(grads))
return loss
net.train()
# 数据迭代训练
for i in range(epochs):
for X, y in train_data:
X, y = X.to(config_args.device), y.to(config_args.device)
res = train_step(X, y)
print("------>epoch:{}, loss:{:.6f}".format(i, res.asnumpy()))
如果想要运用分布式式训练、图模式加速、数据下沉和混合精度等更高阶的训练方式加速训练可以参考3.高阶训练指南;
Step3: 少量手动调整适配
我们识别到仍有部分接口暂时无法完全对标PyTorch,针对这类接口我们正在积极优化中,您可以暂时参考4.手动适配指南进行规避处理(不影响网络的正常执行训练)。如果您有遇到新的问题或无法对标的内容欢迎通过ISSUE 和我们反馈交流。
执行上述三个步骤之后,您的脚本已经可以在昇腾等MindSpore支持的硬件平台执行了!
目前MSAdapte默认支持MindSpore的PYNATIVE模式训练,如果想调用静态图模式进行训练加速,可参考静态图使用教程调用GRAPH训练模式:
ms.set_context(mode=ms.GRAPH_MODE)
注意,部分网络中GRAPH模式训练无法一件切换,可能需要对代码进行相应调整,当前主要体现在inplace类型操作和MindSpore原生框架用法限制,具体细节可参考静态图语法支持;
请参考自动混合精度使用教程;
请参考快速入门分布式并行训练选择合适的分布式训练方式。推荐使用OpenMPI训练方式,其效果类似PyTorch的分布式数据并行DistributedDataParallel训练方式:
# 分布式数据处理
from msadapter.pytorch.utils.data import DataLoader, DistributedSampler
# 初始化通信环境
from mindspore.communication import init
...
train_images = datasets.CIFAR10('./', train=True, download=True, transform=transform)
sampler = DistributedSampler(train_images)
train_data = DataLoader(train_images, batch_size=32, num_workers=2, drop_last=True, sampler=sampler)
...
执行脚本命令为:
mpirun -n DEVICE_NUM python train.py
请参考动态学习率使用教程;
除前文推荐的函数式迭代训练表达外,还有两种训练表达形式可供选择:
方式二:使用MindSpore的Model.train训练
import mindspore as ms
from mindspore.dataset import GeneratorDataset
from mindspore.train.callback import LossMonitor, TimeMonitor
model = LeNet()
criterion = nn.CrossEntropyLoss()
optimizer = ms.nn.SGD(model.trainable_params(), learning_rate=0.1, momentum=0.9, weight_decay=1e-4)
model = ms.Model(model, criterion, optimizer, metrics={'accuracy'})
dataset = GeneratorDataset(source=train_data, column_names=["data", "label"])
model.train(epochs, dataset, callbacks=[TimeMonitor(), LossMonitor()])
方式三:使用WithLossCell和TrainOneStepCell迭代训练
import mindspore as ms
from msadapter.pytorch import nn
import msadapter.pytorch as torch
model = LeNet()
criterion = nn.CrossEntropyLoss()
optimizer = ms.nn.SGD(model.trainable_params(), learning_rate=0.1, momentum=0.9, weight_decay=1e-4)
loss_net = ms.nn.WithLossCell(model, criterion)
train_net = ms.nn.TrainOneStepCell(loss_net, optimizer)
for i in range(epochs):
for X, y in train_data:
loss = train_net(X, y)
通常情况下仅需将数据处理相关导入包修改为从msadapter导入,即可实现PyTorch数据部分的迁移,示例如下:
from msadapter.pytorch.utils.data import DataLoader
from msadapter.torchvision import datasets, transforms
transform = transforms.Compose([transforms.Resize((224, 224), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.247, 0.2435, 0.2616])
])
train_images = datasets.CIFAR10('./', train=True, download=True, transform=transform)
train_data = DataLoader(train_images, batch_size=128, shuffle=True, num_workers=2, drop_last=True)
需注意,MSAdapter的数据处理DataLoader中pin_memory=True暂不生效。
另外,如果遇到数据处理接口未完全适配的场景,可以暂时使用PyTorch原生的数据处理流程,将生成的数据PyTorch张量转为MSAdapter支持的张量对象,请参考convert_tensor 工具使用教程实现;
from msadapter.pytorch.nn import Module, Linear, Flatten
class MLP(Module):
def __init__(self):
super(MLP, self).__init__()
self.flatten = Flatten()
self.line1 = Linear(in_features=1024, out_features=64)
self.line2 = Linear(in_features=64, out_features=128, bias=False)
self.line3 = Linear(in_features=128, out_features=10)
def forward(self, inputs):
x = self.flatten(inputs)
x = self.line1(x)
x = self.line2(x)
x = self.line3(x)
return x
自定义module写法和PyTorch原生写法一致,但需要注意下述问题:
self.phase
,需要用户自行变更变量名;# PyTorch 写法
class GdnFunction(Function):
@staticmethod
def forward(ctx, x, gamma, beta):
# save variables for backprop
ctx.save_for_backward(x, gamma, beta)
...
return y
@staticmethod
def backward(ctx, grad_output):
x, gamma, beta = ctx.saved_variables
...
return grad_input, grad_gamma, grad_beta
# MSadapter 写法
class GdnFunction(nn.Module):
def __init__(self):
super(GdnFunction, self).__init__()
def forward(self, x, gamma, beta):
...
return y
def bprop(self, x, gamma, beta, out, grad_output):
x = torch.Tensor(x)
gamma = torch.Tensor(gamma)
beta = torch.Tensor(beta)
grad_output = torch.Tensor(grad_output)
...
return grad_input, grad_gamma, grad_beta
PyTorch存在一些多态接口,使用灵活。MSAdapter作为Python层适配中间件,暂时只能支持主流场景,部分场景可能需要用户补齐默认参数或替换接口实现,已经识别到的此类接口有:
torch.max(tensor1, tensor2)
需要替换为torch.maximum(tensor1, tensor2)
等价实现;torch.min(tensor1, tensor2)
需要替换为torch.minimum(tensor1, tensor2)
等价实现;torch.randint(10, (2, 2))
需要补齐默认参数torch.randint(0, 10, (2, 2))
等价实现,类似的接口还有torch.arange
/torch.normal
/torch.randint_like
;当前torch.view
操作实际等价于创建指定shape的新tensor,并不真实共享内存,需要用户自己保证tensor的赋值更新。(共享内存的view接口正在研发中,敬请期待!);
暂时无法对标inplace相关操作,当前此类并不真实共享内存,所以torch.xxx(*, out=output)
接口推荐写成output = torch.xxx(*)
形式,tensor_a.xxx_(*)
推荐写成tensor_b = tensor_a.xxx(*)
形式,则该接口在图模式下也可正常执行;
切片后的inplace算子不生效,需修改为如下写法:
# PyTorch 原生写法
boxes[i,:,0::4].clamp_(0, im_shape[i, 1]-1)
# MSAdapter 推荐写法
a = boxes[i,:,0::4].clamp_(0, im_shape[i, 1]-1)
boxes[i, :, 0::4] = a
PyTorch原生接口通过to
等接口将数据拷贝到指定硬件中执行,但是MSAdapter暂不支持指定硬件执行,实际执行的硬件后端由conetxt指定。如果您的程序运行在云脑2,则默认执行昇腾硬件,如果想执行在其他硬件后端可以参考如下代码;
ms.context.set_context(device_target="CPU")
部分接口功能暂时无法对标,请将相关代码删除或进行相应适配,如:
ms.ops.value_and_grad
接口时,如果has_aux
为True,不允许存在多层嵌套的输出(优化中),且求导位置必须为第一个输出;torch.nn.utils.clip_grad_norm_
可替换为 ms.ops.clip_by_global_norm
等价实现梯度裁剪功能;msadapter.pytorch.cast_to_adapter_tensor
接口将输出tensor转换为MSAdapter tensor后方可继续调用torch风格接口。除网络训练部分,不推荐新手混用MSAdapter接口和MindSpore接口;label = f"{class_names[labels[i]]}: {probs[i]:.2f}"
,可先转换为numpy后输出;torch.autograd.Variable
接口,替换为torch.tensor
即可;torch.save
接口仅支持保存网络权重,不支持保存网络结构;Q:设置context.set_context(mode=context.GRAPH_MODE)后运行出现类似问题:
"Tensor.add_" is an in-place operation and "x.add_()" is not encouraged to use in MindSpore static graph mode. Please use "x = x.add()" or other API instead。
A:目前在设置GRAPH模式下不支持原地操作相关的接口,需要按照提示信息进行修改。需要注意的是,即使在PYNATIVE模式下,原地操作相关接口也是不鼓励使用的,因为目前在MSAdapter不会带来内存收益,而且会给反向梯度计算带来不确定性。
Q:运行代码出现类似报错信息:
AttributeError: module 'msadapter.pytorch' has no attribute 'xxx'。
A:首先确定'xxx'是否为torch 1.12版本支持的接口,PyTorch官网明确已废弃或者即将废弃的接口和参数,MSAdapter不会兼容支持,请使用其他同等功能的接口代替。如果是PyTorch对应版本支持,而MSAdapter中暂时没有,欢迎参与MSAdapter项目贡献你的代码,也可以通过创建任务(New issue)反馈需求。
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》