#1224 cpu调试任务,每次运行到特定轮次都会被killed

Closed
created 3 months ago by NoColorZheng · 2 comments
<!-- 为了更有效地识别与解决您的问题,请尽可能的补充如下信息 --> ### 问题描述 cpu调试任务,每次运行到特定轮次都会被killed ### 相关环境(GPU/NPU) GPU ### 相关集群(启智/智算) 智算 ### 任务类型(调试/训练/推理) 调试 ### 任务名 nocol202401202054280 ### 日志说明或问题截图 ![image](/attachments/e1d20c09-9f98-487f-ab13-7cb112dcc086) ### 期望的解决方案或建议 有没有可能是爆内存了,老是卡在acc_iter=10这,通过输出中间变量,发现是acc_iter=11时,grad_fn内部报的killed ![image](/attachments/a3ce0255-d737-4d1a-ad51-167e23ff86b5)
liuzx commented 3 months ago
Collaborator
有可能是内存问题,可以选一个大内存的资源规格试试
NoColorZheng commented 3 months ago
Poster
每次进入网络都要占用掉大量内存
liuzx closed this issue 6 days ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.