Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
BAAI-WuDao 5047a3e658 | 2 years ago | |
---|---|---|
.. | ||
doc | 2 years ago | |
tds | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago |
Tsinghua/Temporary DeepSpeed (TDS) is a plug-in of Microsoft DeepSpeed to fix the bug of the DeepSpeed PipelineEngine.
Although DeepSpeed provides interfaces to support pipeline-parallel training. There are still some bugs and hack implementation in its code, especially the code to send tensors between different stages. We thus reimplement the PipelineEngine of DeepSpeed in TDS.
The first step is to install DeepSpeed. How to install DeepSpeed can refer to DeepSpeed Installation.
Copy the folder "tds" into your project, and use "import tds as deepspeed" instead of "import deepspeed" in your code.
If you want to use pipeline-parallel training, you must add the code to let your model know some essential settings for its forward and backward operations. These settings consist of tensor (including both input data and hidden states) types, whether these tensors need to save gradients, and whether these tensors need to be partitioned across GPUs to save memory. We take training GPT-2 as an example, the detailed code can be found from GPT-2.
def model_provider():
"""Build the model for GPT-2."""
args = get_args()
print_rank_0('building GPT2 model ...')
if args.pipe_parallel_size == 0:
model = GPT2Model(num_tokentypes=0, parallel_output=True)
else:
model = GPT2ModelPipe(num_tokentypes=0, parallel_output=True, topology=mpu.get_topology())
model._megatron_batch_fn = get_batch_pipe
return model
def model_provider():
"""Build the model for GPT-2."""
args = get_args()
print_rank_0('building GPT2 model ...')
if args.pipe_parallel_size == 0:
model = GPT2Model(num_tokentypes=0, parallel_output=True)
else:
model = GPT2ModelPipe(num_tokentypes=0, parallel_output=True, topology=mpu.get_topology())
model._megatron_batch_fn = get_batch_pipe
# The first input tensor is input embeddings and hidden states, it requires to save its gradients. The second input tensor is attention mask.
model._input_grad = [True, False]
# The first input tensor is input embeddings and hidden states, its type is float. The second input tensor is attention mask, its type is boolean.
model._input_type = ['float', 'bool']
# Input embeddings and hidden states can be partitioned across GPUs to save memory.
model._input_pipe_partitioned = [True, False]
return model
All other operations can directly follow DeepSpeed and DeepSpeedExamples.
More examples like using TDS for GPT-2 and T5 can refer to CPM-Pretrain.
If you use the code, please cite the following paper:
@article{cpm-v1,
title={CPM: A Large-scale Generative Chinese Pre-trained Language Model},
author={Zhang, Zhengyan and Han, Xu, and Zhou, Hao, and Ke, Pei, and Gu, Yuxian and Ye, Deming and Qin, Yujia and Su, Yusheng and Ji, Haozhe and Guan, Jian and Qi, Fanchao and Wang, Xiaozhi and Zheng, Yanan and Zeng, Guoyang and Cao, Huanqi and Chen, Shengqi and Li, Daixuan and Sun, Zhenbo and Liu, Zhiyuan and Huang, Minlie and Han, Wentao and Tang, Jie and Li, Juanzi and Sun, Maosong},
year={2020}
}
“悟道”项目开源算法和工具
Python Text C++ Cuda Shell other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》