Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
zhangzy 0c8d06a009 | 2 years ago | |
---|---|---|
data | 2 years ago | |
data_utils | 2 years ago | |
docs | 2 years ago | |
example | 2 years ago | |
gpt2-tokenizer | 2 years ago | |
tools | 2 years ago | |
.gitignore | 2 years ago | |
CONTRIBUTING.md | 2 years ago | |
LICENSE | 2 years ago | |
README-ZH.md | 2 years ago | |
README.md | 2 years ago |
Model Compression for Big Models
Overview • Documentation • Installation • Quick Start • 简体中文
BMCook is a model compression toolkit for large-scale pre-trained language models (PLMs), which integrates multiple model compression methods. You can combine them in any way to achieve the desired speedup. Specifically, we implement the following four model compression methods, knowledge distillation, model pruning, model quantization, and model MoEfication. It has following features:
Our documentation provides more information about the package.
To use BMCook, first install BMTrain.
From PyPI (Recommend)
$ pip install bmtrain
From Source
$ git clone https://github.com/OpenBMB/BMTrain.git
$ cd BMTrain
$ python3 setup.py install
Please refer to the installation guide of BMTrain for more details.
Then, clone the repository.
$ git clone git@github.com:OpenBMB/BMCook.git
The example
folder provides example codes based on GPT-J (6B).
Quantization-aware training:
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
--save-dir results/gpt-j-int8 \
--model gpt-j-full-int8 \
--start-lr 1e-4 \
--load gpt-j.bin
Quantization-aware training with knowledge distillation:
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
--save-dir results/gpt-j-int8-distill \
--model gpt-j-full-int8 \
--start-lr 1e-4 \
--load gpt-j.bin \
--use-kd \
--kd-mse-last-hidden \
--kd-loss-scale 1 \
--load-teacher gpt-j.bin
Model pruning:
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
--save-dir results/gpt-j-prune \
--model gpt-j-full \
--start-lr 1e-4 \
--load gpt-j.bin \
--use-pruning \
--use-kd \
--kd-mse-last-hidden \
--kd-loss-scale 1 \
--load-teacher gpt-j.bin
MoEfication (save the hidden states and then use the MoEfication toolkit):
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
--save-dir results/gpt-j-moe \
--model gpt-j-full-relu \
--start-lr 1e-4 \
--load gpt-j-relu.bin \
--save-hidden
Combine quantization, pruning and knowledge distillation:
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
--save-dir results/gpt-j-int8-prune-distill \
--model gpt-j-full-int8 \
--start-lr 1e-4 \
--load gpt-j.bin \
--use-pruning \
--use-kd \
--kd-mse-last-hidden \
--kd-loss-scale 1 \
--load-teacher gpt-j.bin
Based on GPT-J, we evaluate different combinations of compression techniques. The corpus is OpenWebText.
LM Loss | Relative Performance | Speedup | |
---|---|---|---|
GPT-J | 3.37 | - | 1x |
GPT-J (P+D) | 3.57 | 94.4% | 2x |
GPT-J (P+D+Q) | 3.58 | 94.1% | 8x |
GPT-J (P+D+Q+M) | 3.69 | 91.3% | 10x |
D denotes knowledge distillation. P denotes pruning. Q denotes quantization. M denotes MoEfication.
Model Quantization | Model Pruning | Knowledge Distillation | Model MoEfication | |
---|---|---|---|---|
TextPruner | - | ✅ | - | - |
TensorFlow Lite | ✅ | ✅ | - | - |
PyTorch | ✅ | ✅ | - | - |
TextBrewer | - | ✅ | ✅ | - |
BMCook | ✅ | ✅ | ✅ | ✅ |
In the next version, we will provide a one-line interface for the compression of arbitrary PLMs, which could further simplify the code. Stay tuned!
We welcome everyone to contribute codes following our contributing guidelines.
You can also find us on other platforms:
The package is released under the Apache 2.0 License.
We thank Zhengyan Zhang, Yingfa Chen, Guoyang Zeng, Jie Zhou, and Zhi Zheng for the contribution. More contributors are welcome!
No Description
Text Python C++ Markdown
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》