#116 训练结果无法下载

Closed
created 1 year ago by cquyh · 6 comments
cquyh commented 1 year ago
按照教程的方式通过moxing来保存数据,同步的文件夹为"/home/work/user-job-dir/outputs/model/",日志上输出成功,但结果下载选项是空的。
liuzx commented 1 year ago
Collaborator
结果下载中,模型的下载需要通过train_url将训练镜像的内容同步到obs中,启智平台才会提供下载。具体代码可参考示例。 从截图看日志,你的代码中是将训练镜像的内容同步到训练镜像的另一个文件夹了。可以按照以下代码进行修改: ###Copy the output model to obs### def EnvToObs(train_dir, obs_train_url): try: mox.file.copy_parallel(train_dir, obs_train_url) print("Successfully Upload {} to {}".format(train_dir,obs_train_url)) except Exception as e: print('moxing upload {} to {} failed: '.format(train_dir,obs_train_url) + str(e)) return ###Copy the trained model data from the local running environment back to obs, ###and download it in the training task corresponding to the Qizhi platform parser.add_argument('--train_url', help='model folder to save/load', default= WorkEnvironment('train') + '/model/') train_dir = workroot + '/model' args = parser.parse_args() EnvToObs(train_dir, args.train_url)
JeffDing commented 1 year ago
结果下载没有内容,估计是没有把生成的训练结果拷贝回obs吧 建议先设置一个文件夹用来保存训练结果,训练代码那部分只能训练结果保存到前面设置的文件夹中,最后使用mox将训练结果复制到train_url这样结果查看那边就应该会有训练好的文件了。 代码示例 ```python import moxing as mox local_train_url = '/cache/train' ### 此处省略训练代码 mox.file.copy_parallel('local_train_url',agrs.train_url) #此处假定参数使用的是args,如果是其他的把args.替换即可 ```
cquyh commented 1 year ago
Poster
> 结果下载中,模型的下载需要通过train_url将训练镜像的内容同步到obs中,启智平台才会提供下载。具体代码可参考示例。 > 从截图看日志,你的代码中是将训练镜像的内容同步到训练镜像的另一个文件夹了。可以按照以下代码进行修改: > ###Copy the output model to obs### > def EnvToObs(train_dir, obs_train_url): > try: > mox.file.copy_parallel(train_dir, obs_train_url) > print("Successfully Upload {} to {}".format(train_dir,obs_train_url)) > except Exception as e: > print('moxing upload {} to {} failed: '.format(train_dir,obs_train_url) + str(e)) > return > > ###Copy the trained model data from the local running environment back to obs, > ###and download it in the training task corresponding to the Qizhi platform > parser.add_argument('--train_url', > help='model folder to save/load', > default= WorkEnvironment('train') + '/model/') > train_dir = workroot + '/model' > args = parser.parse_args() > EnvToObs(train_dir, args.train_url) 方法WorkEnvironment未定义,无法正确找到路径
liuzx commented 1 year ago
Collaborator
### Defines whether the task is a training environment or a debugging environment ### def WorkEnvironment(environment): if environment == 'train': workroot = '/home/work/user-job-dir' elif environment == 'debug': workroot = '/home/work' print('current work mode:' + environment + ', workroot:' + workroot) return workroot 可参考示例代码https://git.openi.org.cn/OpenIOSSG/MNIST_Example/src/branch/master/train.py
cquyh commented 1 year ago
Poster
修改之后依然无法下载
liuzx commented 1 year ago
Collaborator
请检查下代码逻辑是否是将/model下的内容上传到obs,上传成功的路径如下: ![image](/attachments/d825fff3-52ef-4b79-9d40-313bb99ead9a)
liuzx closed this issue 11 months ago
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.