We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the provided templates and make sure you fill in all required information in the template.
Compatible MMCV, MMDetection and MMRotate versions are shown as below. Please install the correct version of them to avoid installation issues.
MMRotate version | MMCV version | MMDetection version |
---|---|---|
main | mmcv-full>=1.5.3, <1.8.0 | mmdet >= 2.25.1, <3.0.0 |
0.3.4 | mmcv-full>=1.5.3, <1.8.0 | mmdet >= 2.25.1, <3.0.0 |
0.3.3 | mmcv-full>=1.5.3, <1.7.0 | mmdet >= 2.25.1, <3.0.0 |
0.3.2 | mmcv-full>=1.5.3, <1.7.0 | mmdet >= 2.25.1, <3.0.0 |
0.3.1 | mmcv-full>=1.4.5, <1.6.0 | mmdet >= 2.22.0, <3.0.0 |
0.3.0 | mmcv-full>=1.4.5, <1.6.0 | mmdet >= 2.22.0, <3.0.0 |
0.2.0 | mmcv-full>=1.4.5, <1.5.0 | mmdet >= 2.19.0, <3.0.0 |
0.1.1 | mmcv-full>=1.4.5, <1.5.0 | mmdet >= 2.19.0, <3.0.0 |
0.1.0 | mmcv-full>=1.4.5, <1.5.0 | mmdet >= 2.19.0, <3.0.0 |
"No module named 'mmcv.ops'"; "No module named 'mmcv._ext'".
pip uninstall mmcv
."invalid device function" or "no kernel image is available for execution".
/usr/local/
), nvcc --version
and conda list cudatoolkit
version match.python mmdet/utils/collect_env.py
to check whether PyTorch, torchvision, and MMCV are built for the correct GPU architecture.TORCH_CUDA_ARCH_LIST
to reinstall MMCV.TORCH_CUDA_ARCH_LIST=7.0 pip install mmcv-full
to build MMCV for Volta GPUs."undefined symbol" or "cannot open xxx.so".
python mmdet/utils/collect_env.py
to see if "MMCV Compiler"
/"MMCV CUDA Compiler"
is the same as "GCC"
/"CUDA_HOME"
.python mmdet/utils/collect_env.py
to check whether PyTorch, torchvision, and MMCV are built by and running on the same environment."setuptools.sandbox.UnpickleableException: DistutilsSetupError("each element of 'ext_modules' option must be an Extension instance or 2-tuple")"
pip install -r requirements.txt
.setuptools
, Cython
, and PyTorch
in your environment."Segmentation fault".
Check you GCC version and use GCC 5.4. This usually caused by the incompatibility between PyTorch and the environment (e.g., GCC < 4.9 for PyTorch). We also recommend the users to avoid using GCC 5.5 because many feedbacks report that GCC 5.5 will cause "segmentation fault" and simply changing it to GCC 5.4 could solve the problem.
Check whether PyTorch is correctly installed and could use CUDA op, e.g. type the following command in your terminal.
python -c 'import torch; print(torch.cuda.is_available())'
And see whether they could correctly output results.
If Pytorch is correctly installed, check whether MMCV is correctly installed.
python -c 'import mmcv; import mmcv.ops'
If MMCV is correctly installed, then there will be no issue of the above two commands.
If MMCV and Pytorch is correctly installed, you man use ipdb
, pdb
to set breakpoints or directly add 'print' in mmdetection code and see which part leads the segmentation fault.
"ImportError: cannot import name 'container_bacs' from 'torch._six'"
This is because container_abcs
has been removed since PyTorch 1.9.
Replace
from torch.six import container_abcs
in python3.7/site-packages/e2cnn/nn/modules/module_list.py
with
TORCH_MAJOR = int(torch.__version__.split('.')[0])
TORCH_MINOR = int(torch.__version__.split('.')[1])
if TORCH_MAJOR ==1 and TORCH_MINOR < 8:
from torch.six import container_abcs
else:
import collections.abs as container_abcs
Or downgrade the version of Pytorch.
"Loss goes Nan"
warmup_iters
from 500 to 1000 or 2000.grad_clip
is None
, you can add gradient clippint to avoid gradients that are too large, i.e., set optimizer_config=dict(_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
in your config file. If your config does not inherits from any basic config that contains optimizer_config=dict(grad_clip=None)
, you can simply add optimizer_config=dict(grad_clip=dict(max_norm=35, norm_type=2))
."GPU out of memory"
gpu_assign_thr=N
in the config of assigner thus the assigner will calculate box overlaps through CPU when there are more than N GT boxes.with_cp=True
in the backbone. This uses the sublinear strategy in PyTorch to reduce GPU memory cost in the backbone.fp16 = dict(loss_scale='dynamic')
in the config file."RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one"
find_unused_parameters = True
in the config to solve the above problems or find those unused parameters manually.Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》