关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

triple-Mu fc53080cb0 Format code and upgrade onnxsim		1 year ago
..
README.md	Format code and upgrade onnxsim	1 year ago

eval.py	add partial quantization	1 year ago

eval.yaml	Format code and upgrade onnxsim	1 year ago

partial_quant.py	add partial quantization	1 year ago

ptq.py	Format code and upgrade onnxsim	1 year ago

sensitivity_analyse.py	add partial quantization	1 year ago

utils.py	add partial quantization	1 year ago

README.md

Partial Quantization

Partial Quantization

The performance of YOLOv6s heavily degrades from 42.4% to 35.6% after traditional PTQ, which is unacceptable. To resolve this issue, we propose partial quantization. First we analyze the quantization sensitivity of all layers, and then we let the most sensitive layers to have full precision as a compromise.

With partial quantization, we finally reach 42.1%, only 0.3% loss in accuracy, while the throughput of the partially quantized model is about 1.56 times that of the FP16 model at a batch size of 32. This method achieves a nice tradeoff between accuracy and throughput.

Prerequirements

pip install --extra-index-url=https://pypi.ngc.nvidia.com --trusted-host pypi.ngc.nvidia.com nvidia-pyindex
pip install --extra-index-url=https://pypi.ngc.nvidia.com --trusted-host pypi.ngc.nvidia.com pytorch_quantization

Sensitivity analysis

Please use the following command to perform sensitivity analysis. Since we randomly sample 128 images from train dataset each time, the sensitivity files will be slightly different.

 python3 sensitivity_analyse.py --weights yolov6s_reopt.pt \
                                --batch-size 32 \
                                --batch-number 4 \
                                --data-root train_data_path