#1105 Pod ephemeral local storage usage exceeds the total limit of containers 40Gi.

Closed
created 9 months ago by trainer · 5 comments
trainer commented 9 months ago
<!-- 为了更有效地识别与解决您的问题,请尽可能的补充如下信息 --> ### 问题描述 [Scheduled] 2023/07/30 22:10:21 Successfully assigned fccb038c23234b9e80105d4ccd152117/v07036d51c7b4a61a2807ee313b816ac-task0-0 to pclv100s-6 [Pulling] 2023/07/30 22:10:23 Pulling image "swr.cn-south-1.myhuaweicloud.com/openi/cuda111_python37_pytorch191:v1" [Pulled] 2023/07/30 22:10:33 Successfully pulled image "swr.cn-south-1.myhuaweicloud.com/openi/cuda111_python37_pytorch191:v1" in 9.890531236s [Created] 2023/07/30 22:10:42 Created container task0 [Started] 2023/07/30 22:10:44 Started container task0 [Evicted] 2023/07/30 22:15:51 Pod ephemeral local storage usage exceeds the total limit of containers 40Gi. [Killing] 2023/07/30 22:15:51 Stopping container task0 [ExceededGracePeriod] 2023/07/30 22:16:01 Container runtime did not kill the pod within specified grace period. [Exit] failed ### 相关环境(GPU/NPU) 规格 GPU: 1*V100, CPU: 8, 显存: 32GB, 内存: 50GB ### 相关集群(启智/智算) 智算中心 ### 任务类型(调试/训练/推理) 训练 ### 任务名 train202307302226210 ### 日志说明或问题截图 Pod ephemeral local storage usage exceeds the total limit of containers 40Gi. ### 期望的解决方案或建议 支持更大的空间
这是空间不够用了吗
镜像也开始清理大于100G的了
kkrun commented 9 months ago
现在好像可以了,能解压大的数据集了。
trainer commented 9 months ago
Poster
好的,我重新运行一下,谢谢。
liuzx commented 9 months ago
Collaborator
已经解决了这个问题。
liuzx closed this issue 6 months ago
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.