Deleting a branch is permanent. It CANNOT be undone. Continue?
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》
问题描述
创建的训练任务一直WAITING,无法RUNNING
相关环境(GPU/NPU)
GPU
相关集群(启智/智算)
智算
任务类型(调试/训练/推理)
训练
任务名
train02
日志说明或问题截图
[FailedScheduling] 2023/12/12 16:12:05
all nodes are unavailable: 19 node(s) resource fit failed, 8 node(s) selector fit queue failed.
期望的解决方案或建议
不知道这个问题是怎么回事,是因为GPU资源不够一直在排队吗?
感觉智算集群没有启智集群好用啊,我在智算集群创建训练任务创建了2个多小时任务状态还是WAITING,不明白为什么要撤掉启智集群,唉!
资源调整,之后是以智算集群为主了。目前已重新上线a100,v100