#868 ImportError: Failed importing torch. This likely means that some torch modules require additional dependencies that have to be manually installed (usually with `pip install torch`). ​

Closed
created 2 months ago by zzc0208 · 11 comments
zzc0208 commented 2 months ago
https://openi.pcl.ac.cn/zzc0208/so-vits-svc-npu/src/branch/feat-loguru-4.1-stable/train.py ```shell load old checkpoint failed... Epoch 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- *Elapsed 0:00:00 Traceback (most recent call last): File "/home/ma-user/so-vits-svc-npu/train.py", line 319, in <module> main() File "/home/ma-user/so-vits-svc-npu/train.py", line 42, in main run(0,1,hps) File "/home/ma-user/so-vits-svc-npu/train.py", line 114, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, File "/home/ma-user/so-vits-svc-npu/train.py", line 137, in train_and_evaluate for batch_idx, items in enumerated_train_loader: File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/dataloader.py", line 682, in __next__ data = self._next_data() File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/dataloader.py", line 967, in _next_data return self._process_data(data) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/dataloader.py", line 996, in _process_data data.reraise() File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/_utils.py", line 100, in reraise raise exception ImportError: Caught ImportError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/serialization.py", line 374, in load return _legacy_load(opened_file, pickle_module, **pickle_load_args) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/serialization.py", line 405, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/serialization.py", line 89, in try_import mod = importlib.import_module(module_name) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 680, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 850, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch/lib/../../torch.libs/libgomp-6e1a1d1b.so.1.0.0: cannot allocate memory in static TLS block The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ma-user/so-vits-svc-npu/data_utils.py", line 125, in __getitem__ return self.random_slice(*self.get_audio(self.audiopaths[index][0])) File "/home/ma-user/so-vits-svc-npu/data_utils.py", line 60, in get_audio spec = torch.load(spec_filename) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/serialization.py", line 378, in load pt = try_import('torch') File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0.dev0-py3.9.egg/mindtorch/torch/serialization.py", line 97, in try_import raise ImportError(err_msg) from error ImportError: Failed importing torch. This likely means that some torch modules require additional dependencies that have to be manually installed (usually with `pip install torch`). (MindSpore) [root@v70d1d2bfa2d48b1a06a430915d572c4-task0-0 so-vits-svc-npu]# ``` torch正常安装,尝试了`pip install torch==2.1.0`,提示已安装,`pip list`也显示正常 ```shell (MindSpore) [root@v70d1d2bfa2d48b1a06a430915d572c4-task0-0 so-vits-svc-npu]# pip3 list Package Version ----------------------------- --------------- ... torch 2.1.0 torch-npu 2.1.0 torchaudio 2.1.0 ... ```
hanjr commented 2 months ago
Collaborator
运行train.py的时候,图中红框内加载checkpoint的地方能正确加载不报错么。
zzc0208 commented 2 months ago
Poster
不能加载,会直接输出`load old checkpoint failed`,但是这两个权重文件在GPU平台上正常工作,不知道是否需要转换成mindspore权重格式
hanjr commented 2 months ago
Collaborator
权重文件保存时候的torch 版本要和加载时候的torch版本保持一致,这个问题可能是因为,保存torch的权重的时候 版本比较早,现在你是用新版本torch,就会出现这种情况。 `但是这两个权重文件在GPU平台上正常工作` 你在gpu上面和npu镜像上面使用的torch 版本号一样么?
zzc0208 commented 2 months ago
Poster
一样的,GPU环境使用的是torch2.1.0版本,能正常加载
zzc0208 commented 2 months ago
Poster
你好,`G_0.pth` `D_0.pth`预训练底模上传到了https://openi.pcl.ac.cn/zzc0208/so-vits-svc-npu/modelmanage/model_filelist_tmpl?name=so-vits-svc-pretrained 如有需要测试本项目, 可以使用https://openi.pcl.ac.cn/zzc0208/so-vits-svc-npu/datasets 中的`data.tar.gz`进行测试,已经预处理完成可以直接在项目目录解压完进行训练 为方便调试,也可以使用https://openi.pcl.ac.cn/zzc0208/so-vits-svc-npu/datasets 中的`so-vits-svc-npu.zip` 包含了项目所需的vocoder等模型文件
hanjr commented 2 months ago
Collaborator
经过排查,load时候不能正确加载模型的原因,和你之前的issue 原因一致。https://openi.pcl.ac.cn/OpenI/MSAdapter/issues/859 在openi平台上使用npu时,会出现导入mindspore后不能正确导入torch的情况,导致加载torch的模型会出现错误,目前原因还在排查。 可以使用规避方案,比如你在本地执行load后,使用save接口,将D 、G两个模型保存为 mindtorch支持格式的pt文件,然后在npu上面直接加载。
hanjr commented 2 months ago
Collaborator
经过排查后,pytorch库的导入会引起错误。该错误已知可发生在部分环境下导入第三方库(包括不限于opencv/sklearn)之后再导入torch。**https://github.com/pytorch/pytorch/issues/2575** 在报错信息中会有类似下面的提示: ``` File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch/lib/../../torch.libs/libgomp-6e1a1d1b.so.1.0.0: cannot allocate memory in static TLS block ``` 可以将so.1.0.0文件预先导入环境变量解决问题,对于当前issue方案为,在terminal中执行命令: ``` export LD_PRELOAD=/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0:/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch.libs/libgomp-6e1a1d1b.so.1.0.0 ``` 对于其他通用情况,可以查找报错中的so文件位置,并将文件加入环境变量: ``` export LD_PRELOAD={first .so file path}:{second .so file path}:{}... ```
zzc0208 commented 2 months ago
Poster
收到,感谢
zzc0208 commented 2 months ago
Poster
你好,我在使用新疆大学NPU镜像时,貌似不能使用这个解决方案 ```shell (MindSpore) [root@a1b89e31fe0f43fe81ab53df5705b778-task0-0 ma-user]# export LD_PRELOAD=/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0:/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch.libs/libgomp-6e1a1d1b.so.1.0.0 ERROR: ld.so: object '/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. (MindSpore) [root@a1b89e31fe0f43fe81ab53df5705b778-task0-0 ma-user]# ```
564 KiB
hanjr commented 2 months ago
Collaborator
你得确定一下你的faiss的so文件位置。 我是根据报错信息的提示去找的so文件位置。你得看一下你的报错信息。
hanjr commented 1 week ago
Collaborator
在arm linux系统上可能会出现上述问题,需要确定so文件在自己系统中的位置然后使用上面提供的 export 预先导入库的方法来规避,当前问题如无疑问先关闭issue,如果后续有问题另提issue。
hanjr closed this issue 1 week ago
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.