Deleting a branch is permanent. It CANNOT be undone. Continue?
Deleting a branch is permanent. It CANNOT be undone. Continue?
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》
问题描述
使用mindspore_2.1.0-cann_6.3.2-py_3.7-euler_2.10.7-aarch64-d910b-train镜像,训练任务一直显示waiting,先前这个项目创建的训练任务会卡在waiting状态下1个小时。
相关环境(GPU/NPU)
NPU
相关集群(启智/智算)
智算
任务类型(调试/训练/推理)
训练
任务名
nocol202310251934125
日志说明或问题截图
期望的解决方案或建议
本想使用mindspore1.10.0版本镜像,但是项目中要去继承train.Metric,1.10.0版本镜像会报错,才尝试用2.1.0版本
看上去是镜像选了910b、规格没选910b导致的(OpenI/aiforge#4772 将会优化这块)。可以重新创建任务,镜像、规格都选910b试试。
另外,运行参数multi_data_url不需要自己添加,界面上选择数据集文件即可。
问题解决了,就是镜像与规格不匹配的问题
又出现新的问题了,我使用相应的镜像规格进行训练,显示:
这个问题没有遇到过,是我写的代码的问题吗?最新的任务名:nocol202310261918278
代码问题。此issue将关闭,有问题请重提issue