zhangzy 0c8d06a009 更新 'README.md'		2 years ago
data	init	2 years ago

data_utils	init	2 years ago

docs	update docs	2 years ago

example	update BMQuant	2 years ago

gpt2-tokenizer	init	2 years ago

tools	init	2 years ago

.gitignore	change bmpretrain	2 years ago

CONTRIBUTING.md	Update CONTRIBUTING.md	2 years ago

LICENSE	Create LICENSE	2 years ago

README-ZH.md	Update README-ZH.md	2 years ago

README.md	更新 'README.md'	2 years ago

🍳 BMCook

Model Compression for Big Models

Overview • Documentation • Installation • Quick Start • 简体中文

What's New

2022/3/29 (BMCook 0.1.0) Now we publicly release the first version of BMCook.

Overview

BMCook is a model compression toolkit for large-scale pre-trained language models (PLMs), which integrates multiple model compression methods. You can combine them in any way to achieve the desired speedup. Specifically, we implement the following four model compression methods, knowledge distillation, model pruning, model quantization, and model MoEfication. It has following features:

Various Supported Methods. Compared to existing compression toolkits, BMCook supports all mainstream acceleration methods for pre-trained language models.
User Friendly. Based on BMCook, we can implement different compression methods with just a few lines of codes.
Combination in Any Way. Due to the decoupled implications, the compression methods can be combined in any way towards extreme acceleration.

Documentation

Our documentation provides more information about the package.

Installation

To use BMCook, first install BMTrain.

From PyPI (Recommend)

$ pip install bmtrain

From Source

$ git clone https://github.com/OpenBMB/BMTrain.git
$ cd BMTrain
$ python3 setup.py install

Please refer to the installation guide of BMTrain for more details.

Then, clone the repository.

$ git clone git@github.com:OpenBMB/BMCook.git

Quick Start

The example folder provides example codes based on GPT-J (6B).

Quantization-aware training：

    torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
     --save-dir results/gpt-j-int8 \
     --model gpt-j-full-int8 \
     --start-lr 1e-4 \
     --load gpt-j.bin

Quantization-aware training with knowledge distillation：

    torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
     --save-dir results/gpt-j-int8-distill \
     --model gpt-j-full-int8 \
     --start-lr 1e-4 \
     --load gpt-j.bin \
     --use-kd \
     --kd-mse-last-hidden \
     --kd-loss-scale 1 \
     --load-teacher gpt-j.bin

Model pruning：

    torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
     --save-dir results/gpt-j-prune \
     --model gpt-j-full \
     --start-lr 1e-4 \
     --load gpt-j.bin \
     --use-pruning \
     --use-kd \
     --kd-mse-last-hidden \
     --kd-loss-scale 1 \
     --load-teacher gpt-j.bin

MoEfication (save the hidden states and then use the MoEfication toolkit)：

    torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
     --save-dir results/gpt-j-moe \
     --model gpt-j-full-relu \
     --start-lr 1e-4 \
     --load gpt-j-relu.bin \
     --save-hidden

Combine quantization, pruning and knowledge distillation：

    torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \
     --save-dir results/gpt-j-int8-prune-distill \
     --model gpt-j-full-int8 \
     --start-lr 1e-4 \
     --load gpt-j.bin \
     --use-pruning \
     --use-kd \
     --kd-mse-last-hidden \
     --kd-loss-scale 1 \
     --load-teacher gpt-j.bin

Performances

Based on GPT-J, we evaluate different combinations of compression techniques. The corpus is OpenWebText.

	LM Loss	Relative Performance	Speedup
GPT-J	3.37	-	1x
GPT-J (P+D)	3.57	94.4%	2x
GPT-J (P+D+Q)	3.58	94.1%	8x
GPT-J (P+D+Q+M)	3.69	91.3%	10x

D denotes knowledge distillation. P denotes pruning. Q denotes quantization. M denotes MoEfication.

Comparisons

	Model Quantization	Model Pruning	Knowledge Distillation	Model MoEfication
TextPruner	-	✅	-	-
TensorFlow Lite	✅	✅	-	-
PyTorch	✅	✅	-	-
TextBrewer	-	✅	✅	-
BMCook	✅	✅	✅	✅

TODO

In the next version, we will provide a one-line interface for the compression of arbitrary PLMs, which could further simplify the code. Stay tuned!

Community

We welcome everyone to contribute codes following our contributing guidelines.

You can also find us on other platforms:

QQ Group: 735930538
Website: http://www.openbmb.org
Weibo: http://weibo.cn/OpenBMB
Twitter: https://twitter.com/OpenBMB

License

The package is released under the Apache 2.0 License.

Contributors

We thank Zhengyan Zhang, Yingfa Chen, Guoyang Zeng, Jie Zhou, and Zhi Zheng for the contribution. More contributors are welcome!

No Description

Text Python C++ Markdown

zhangzhengyan14@mails.tsinghua.edu.cn

yus21@mails.tsinghua.edu.cn 44703133+zh-zheng@users.noreply.github.com

How to access data resources in code