Hybrid-Parallel-Solutions-MindSpore

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

taoht 6d226e68c2 Update		2 years ago
..
dataset/example	Hybrid parallel implementation	2 years ago

doc	Update	2 years ago

scripts	Hybrid parallel implementation	2 years ago

src	Update	2 years ago

README.md	Update	2 years ago

create_data.py	Hybrid parallel implementation	2 years ago

eval.py	Hybrid parallel implementation	2 years ago

export.py	Hybrid parallel implementation	2 years ago

ma-pre-start.sh	Hybrid parallel implementation	2 years ago

mindspore_hub_conf.py	Hybrid parallel implementation	2 years ago

train.py	Update	2 years ago

简要说明

模型来源于: MindSpore:r1.1>Model_zoo>official>nlp>transformer
混合并行策略基于MindSpore的半自动并行实现
项目已经提供测试数据集，代码可无修改启动训练

MindSpore >= 1.1.1
HuaWei Ascend 910

train.py中配置项：

args.distribute = True (单节点为False)
args.Hybrid_Parallel = False

./src/config.py中配置项：

model_parallel = False
batchsize = 128 (128*device_num)

train.py中配置项：

args.distribute = True
args.Hybrid_Parallel = True

./src/config.py中配置项，其中要求dp*mp=device_num

model_parallel = True
dp = 2 (Op-level 算子级数据并行维度)
mp = 2 (Op-level 算子级模型并行维度)
batchsize = 1024 (1024*device_num)

python train.py --distribute True --Hybrid_Parallel True

让模型的训练更有效率(10B以内)，支持训练更大规模的模型（>10B、50B、100B），构建支持分布式混合并行的典型模型案例，是该项目的初衷。

mindspore 大模型自然语言处理神经网络机器学习

Python Shell Perl

Apache-2.0

taoht@pcl.ac.cn