#1 update readme

Merged
superqing merged 2 commits from zhangyh02/Model-Compression:master into master 2 years ago
  1. +19
    -19
      README.md

+ 19
- 19
README.md View File

@@ -1,36 +1,36 @@
# Model-Compression

In infer phrase, the **PanGu α-13B&2.6B** are compressed frome 8 NPUs to 1 NPU with only about 2% performance fluctuates. To achieve this, a variety of model compression techniques are applied and Mindspore source codes are adapted. Please modify the policy file and model file path of Pangu α-13B/2.6B when using it. The implementation is based on Pengcheng cloud brain II. For local implementation, please modify the file path and the codes related to file replication.
In the inference phase, the **PanGu α-13B&2.6B** models are compressed from 8 NPUs to 1 NPU with only about 2% performance fluctuations. To achieve this, a variety of model compression techniques are applied and the MindSpore source codes are adapted. Please modify the policy file and model file paths of Pangu-α-13B/2.6B when using it. The implementation is based on the settings of PengCheng Cloud Brain II. For local implementation, please modify the file path and the codes related to file replication.


- [methods](#Methods)
- [codes](#Codes)
- [performances](#Performances)
- [environments](#Environments)
- [Methods](#Methods)
- [Codes](#Codes)
- [Performances](#Performances)
- [Environments](#Environments)

## Methods
- **Quantization**
Loading model with low precision, the most of float32 parameters are convert to float16 and the corresponding quantization noise are processed.
By loading a model with low precision, most of the float32 parameters can be converted to float16 and the corresponding quantization noise is processed.

- **Parameter sharing**
This model has been adopted to share the output layer parameters with the embedding layer parameters.
When the embedding size is 2560 and the thesaurus size is 40000, the parameters 40000 * 2560 can be saved.
This model has been adapted to share the output layer parameters with the embedding layer parameters.
When the embedding size is 2560 and the vocabulary size is 40000, 40000 * 2560 parameters can be saved.

- **Mindspore source code adaptation**
Model parallelism is inconsistent, that is, semi-automatic model parallelism during training and non model parallelism during loading.
For the problem of inconsistent types, the parameter types saved in training are inconsistent with the model parameter types during inference, and the underlying support of mindscore needs to be modified.
The model parallelism strategies during training and loading are inconsistent. Specifically, semi-automatic model parallelism is used during training and no model parallelism is used during loading.
In addition, the parameter types saved after training are also inconsistent with the model parameter types used during inference. Thus, the underlying support of MindSpore needs to be modified.

## Codes
There are three main differences in the inference code.
- **model define**
- **Model definition**
```bash
pangu_dropout_recompute_eos_fp16.py
```
- **mindspore**
- **MindSpore modification**
```bash
mindspore_ascend-1.1.0-cp37-cp37m-linux_aarch64.whl
```
- **model loading**
- **Model loading**
```bash
from eval_task-13b-fp16 import get_model
model = get_model(args)
@@ -40,17 +40,17 @@ model = get_model(args)

| Model | Memory occupation | inference speed |
| :---: | :----------------: | :---------------: |
| PanGu-Alpha-13B (Before compression) | 8NPU | ~150ms |
| PanGu-Alpha-13B (After compression) | 1NPU | ~250ms |
| PanGu-α-13B (Before compression) | 8NPU | ~150ms |
| PanGu-α-13B (After compression) | 1NPU | ~250ms |

- **Downstream tasks**

| | WebQA.v1.0 (em/f1) | CLUEWSC2020 (acc) |
| :---: | :----------------: | :----------------: |
| zero shot |
| PanGu-Alpha-13B (Before compression) | 5.126/14.470 | 75.000 |
| PanGu-Alpha-13B (After compression) | 5.060/14.466 | 73.684 |
| PanGu-α-13B (Before compression) | 5.126/14.470 | 75.000 |
| PanGu-α-13B (After compression) | 5.060/14.466 | 73.684 |

## Environments
[mindspore](https://git.openi.org.cn/PCL-Platform.Intelligence/Model-Compression/datasets?type=0)
[PanGu-Alpha](https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha)
[MindSpore](https://git.openi.org.cn/PCL-Platform.Intelligence/Model-Compression/datasets?type=0)
[PanGu-α](https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha)

Loading…
Cancel
Save