Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
ZGY 949ffefe97 | 1 year ago | |
---|---|---|
.github | 2 years ago | |
bminf | 1 year ago | |
cuda | 1 year ago | |
example/glm-130B | 1 year ago | |
old_docs | 1 year ago | |
.dockerignore | 2 years ago | |
.gitignore | 2 years ago | |
.gitmodules | 1 year ago | |
CONTRIBUTING.md | 2 years ago | |
Dockerfile | 2 years ago | |
LICENSE | 2 years ago | |
MANIFEST.in | 2 years ago | |
README-ZH.md | 1 year ago | |
README.md | 1 year ago | |
logo.png | 1 year ago | |
requirements.txt | 1 year ago | |
setup.py | 1 year ago |
Efficient Inference for Big Models
Overview • Installation • Quick Start • 简体中文
cupy
and supports PyTorch backpropagation.generate
interface and added a new CPM 2.1 demo.Note: README for BMInf-1
can be found in old_docs
directory. Examples of CPM-1/2 and EVA will be published soon.
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU in its minimum requirements. Running with better GPUs leads to better performance. In cases where the GPU memory supports the large model inference (such as V100 or A100), BMInf still has a significant performance improvement over the existing PyTorch implementation.
If you use the code, please cite the following paper:
@inproceedings{han2022bminf,
title={BMInf: An Efficient Toolkit for Big Model Inference and Tuning},
author={Han, Xu and Zeng, Guoyang and Zhao, Weilin and Liu, Zhiyuan and Zhang, Zhengyan and Zhou, Jie and Zhang, Jun and Chao, Jia and Sun, Maosong},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
pages={224--230},
year={2022}
}
From pip: pip install bminf
From source code: download the package and run python setup.py install
Here we list the minimum and recommended configurations for running BMInf.
Minimum Configuration | Recommended Configuration | |
---|---|---|
Memory | 16GB | 24GB |
GPU | NVIDIA GeForce GTX 1060 6GB | NVIDIA Tesla V100 16GB |
PCI-E | PCI-E 3.0 x16 | PCI-E 3.0 x16 |
GPUs with compute
capability 6.1 or higher are supported by BMInf. Refer to the table to check whether your GPU is supported.
BMInf requires CUDA version >= 10.1 and all the dependencies can be automaticlly installed by the installation process.
Use bminf.wrapper
to automatically convert your model.
import bminf
# initialize your model on CPU
model = MyModel()
# load state_dict before using wrapper
model.load_state_dict(model_checkpoint)
# apply wrapper
with torch.cuda.device(CUDA_DEVICE_INDEX):
model = bminf.wrapper(model)
If bminf.wrapper
does not fit your model well, you can use the following method to replace it manually.
torch.nn.ModuleList
with bminf.TransformerBlockList
.module_list = bminf.TransformerBlockList([
# ...
], [CUDA_DEVICE_INDEX])
torch.nn.Linear
with bminf.QuantizedLinear
.linear = bminf.QuantizedLinear(torch.nn.Linear(...))
Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py
and benchmark/cpm2/decoder.py
to test the speed on your machine!
Implementation | GPU | Encoder Speed (tokens/s) | Decoder Speed (tokens/s) |
---|---|---|---|
BMInf | NVIDIA GeForce GTX 1060 | 718 | 4.4 |
BMInf | NVIDIA GeForce GTX 1080Ti | 1200 | 12 |
BMInf | NVIDIA GeForce GTX 2080Ti | 2275 | 19 |
BMInf | NVIDIA Tesla V100 | 2966 | 20 |
BMInf | NVIDIA Tesla A100 | 4365 | 26 |
PyTorch | NVIDIA Tesla V100 | - | 3 |
PyTorch | NVIDIA Tesla A100 | - | 7 |
We welcome everyone to contribute codes following our contributing guidelines.
You can also find us on other platforms:
The package is released under the Apache 2.0 License.
No Description
Python Markdown Cuda Shell Makefile other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》