Deleting a branch is permanent. It CANNOT be undone. Continue?
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》
问题描述
请问模型的并发性如何配置?申请了一个NPU: 4*Ascend 910, CPU: 96, 显存: 32GB, 内存: 256GB规格的资源,发现我的模型再4个资源里独立运行,未达到一个模型并行执行的效果
相关环境(GPU/NPU)
NPU
相关集群(启智/智算)
智算
任务类型(调试/训练/推理)
训练
任务名
lld20202307202262237
日志说明或问题截图
期望的解决方案或建议
如何做并行训练,取决于代码,可以参考下mindspore的官方分布式教程https://www.mindspore.cn/tutorials/experts/zh-CN/r2.0/parallel/introduction.html?&highlight=%E5%88%86%E5%B8%83%E5%BC%8F%E5%B9%B6%E8%A1%8C%E6%80%BB%E8%A7%88
请问楼主实现了并行训练了吗
我今天刚配置数据平行进行训练,正在跑
但感觉4个资源都是分别跑,模型参数并没有共享,损失函数未梯度下降,对这 mindspore的并行还需要学习