#224 模型文件回写失败问题

Closed
created 1 year ago by jiayu_neu · 1 comments
jiayu_neu commented 1 year ago
<!-- 为了更有效地识别与解决您的问题,请尽可能的补充如下信息 --> ### 问题描述 智算GPU训练任务模型回写时报错:`FPutObject(efficientnet_b0-650_142.ckpt) failed: The difference between the request time and the server's time is too large.` 导致模型结果无法显示和下载。 ### 相关环境(GPU/NPU) + GPU ### 相关集群(启智/智算) + 智算 ### 任务类型(调试/训练/推理) + 训练 ### 任务名 + efficientv1_11 + [任务链接](https://git.openi.org.cn/jiayu_neu/EfficientNetV1_MindSpore/grampus/train-job/c938fa7e0ebc457fb461a56eb993e487) ### 日志说明或问题截图 + 日志报错如下: ```text 2022/10/10 00:07:51 start uploading model 2022/10/10 00:07:51 file name:/tmp/output/efficientnet_b0-200_142.ckpt 2022/10/10 00:16:23 file name:/tmp/output/efficientnet_b0-graph.meta 2022/10/10 00:16:50 file name:/tmp/output/efficientnet_b0-650_142.ckpt 2022/10/10 00:41:45 FPutObject(efficientnet_b0-650_142.ckpt) failed: The difference between the request time and the server's time is too large. 2022/10/10 00:41:45 finish uploading model ``` ### 期望的解决方案或建议
jiayu_neu commented 1 year ago
Poster
今天又出现了同样的问题 ```test 2022/10/11 01:07:34 start uploading model 2022/10/11 01:07:34 file name:/tmp/output/efficientnet_b0-770_142.ckpt 2022/10/11 01:51:45 FPutObject(efficientnet_b0-770_142.ckpt) failed: The difference between the request time and the server's time is too large. 2022/10/11 01:51:45 finish uploading model ``` 导致一个ckpt文件都没回写出来,这样的话就又白白训练了一次! 希望可以尽快排查问题,看看是我这边设置的原因还是服务器网络的原因。
lewis was assigned by zeizei 1 year ago
liuzx closed this issue 6 months ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.