Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
BAAI-WuDao 8eb02b7903 | 2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago | |
fmoefy-v2.2.patch | 2 years ago |
FastMoE works with different versions of
Megatron-LM.
See fmoe/megatron/utils.py
for arguments of FastMoE.
An example patch is provided for v2.2
release.
The patch can be directly applied to add FastMoE support if you are using
Megatron-LM v2.2.
Otherwise, you may need to manually enable FastMoE in your codebase.
The patch includes the following modifications.
In megatron/arguments.py
, add _add_fmoe_args
to the parser.
In megatron/training.py
, replace load_checkpoint
and save_checkpoint
by
functions with the same name in fmoe.megatron.checkpointing
.
In megatron/training.py
, the fmoe.megatron.fmoefy
function is used as an
entrance to one-key introduce FastMoE layer to replace the MLP layers in the
transformer language models.
from fmoe.megatron import fmoefy
model = fmoefy(model, num_experts=4)
Note that the fmoefy
function currently only takes a standard Megatron-LM's
top-level raw model as input, i.e. the MLP layers should be available at
model.language_model.transformer.layers[i].mlp
.
In megatron/training.py
, the LocalDDP
module is replaced by the one in
fmoe.megatron
to enable the sophiscated data parallel strategies that can
parallelize the experts across both the data parallel group and the (tensor)
model parallel model group.
# from megatron.model import DistributedDataParallel as LocalDDP
from fmoe.megatron import DistributedDataParallel as LocalDDP
Start traning with FastMoE by using the scripts provided by Megatron-LM.
“悟道”项目开源算法和工具
Python Text C++ Cuda Shell other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》