WuDao-Algorithm_Tool

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

BAAI-WuDao 8eb02b7903 上传文件至 'FastMoE/examples/megatron'		2 years ago
..
README.md	上传文件至 'FastMoE/examples/megatron'	2 years ago

fmoefy-v2.2.patch	上传文件至 'FastMoE/examples/megatron'	2 years ago

README.md

Add arguments to Megatron's argparser
Patch checkpoint
Building the model in FastMoE style
Using FastMoE's model parallellization
Train as usual

FastMoE works with different versions of
Megatron-LM.
See fmoe/megatron/utils.py for arguments of FastMoE.

An example patch is provided for v2.2 release.
The patch can be directly applied to add FastMoE support if you are using
Megatron-LM v2.2.
Otherwise, you may need to manually enable FastMoE in your codebase.
The patch includes the following modifications.

Add arguments to Megatron's argparser

In megatron/arguments.py, add _add_fmoe_args to the parser.

Patch checkpoint

In megatron/training.py, replace load_checkpoint and save_checkpoint by
functions with the same name in fmoe.megatron.checkpointing.

Building the model in FastMoE style

In megatron/training.py, the fmoe.megatron.fmoefy function is used as an
entrance to one-key introduce FastMoE layer to replace the MLP layers in the
transformer language models.

from fmoe.megatron import fmoefy
model = fmoefy(model, num_experts=4)

Note that the fmoefy function currently only takes a standard Megatron-LM's
top-level raw model as input, i.e. the MLP layers should be available at
model.language_model.transformer.layers[i].mlp.

Using FastMoE's model parallellization

In megatron/training.py, the LocalDDP module is replaced by the one in
fmoe.megatron to enable the sophiscated data parallel strategies that can
parallelize the experts across both the data parallel group and the (tensor)
model parallel model group.

# from megatron.model import DistributedDataParallel as LocalDDP
from fmoe.megatron import DistributedDataParallel as LocalDDP

Train as usual

Start traning with FastMoE by using the scripts provided by Megatron-LM.

“悟道”项目开源算法和工具

Python Text C++ Cuda Shell other

How to access data resources in code