Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
kimxiaogen 35195ceb1e | 1 year ago | |
---|---|---|
data | 2 years ago | |
script | 2 years ago | |
src | 1 year ago | |
.gitignore | 2 years ago | |
README.md | 2 years ago | |
README_CN.md | 2 years ago | |
default_config.yaml | 2 years ago | |
default_eval_config.yaml | 2 years ago | |
enwik8_base.yaml | 1 year ago | |
eval.py | 1 year ago | |
getdata.sh | 2 years ago | |
hccl.sh | 2 years ago | |
hccl_4p_0123_127.0.0.1.json | 2 years ago | |
hccl_tools.py | 2 years ago | |
lr_of_40w_steps.npy | 2 years ago | |
rank_table_1pcs.json | 2 years ago | |
rank_table_2pcs.json | 2 years ago | |
rank_table_4pcs.json | 2 years ago | |
rank_table_8pcs.json | 2 years ago | |
requirements.txt | 2 years ago | |
static_lr.py | 2 years ago | |
torch_einsum.py | 1 year ago | |
train.py | 1 year ago |
Transformer-XL是对Transformer的改进,主要是解决长序列的问题。同时结合了RNN序列建模和Transformer自注意力机制的优点,引入循环机制(Recurrence Mechanism)和相对位置编码(Relative Positional Encoding),在输入数据的每个段上使用Transformer的注意力模块,并使用循环机制来学习连续段之间的依赖关系。并成功在enwik8、text8等语言建模数据集上取得SoTA效果。
论文: Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.
enwiki8 text8
One Billion Word
WT-103
PTB (w/o finetuning)
数据集准备完成后,请按照如下步骤开始训练和评估:
# 运行训练示例
bash scripts/
# 运行分布式训练示例
bash scripts/
# 运行评估示例
python eval.py > eval.log 2>&1 &
.
└─Transformer-XL
├─README.md
├─scripts
├─run_enwik8_base.sh
├─run_enwik8_large.perl
├─run_lm1b_base.sh
├─run_lm1b_large.sh
├─run_text8_base.sh
├─run_lm1b_large.sh
├─run_wt103_base.sh
└─run_wt103_large.sh
├─src
├─callback
├─eval.py
├─flag.py
└─log.py
├─loss_fn
├─ProjectedAdaptiveLogSoftmaxLoss.py
└─SampleSoftmaxLoss.py
├─metric
└─calc.py
├─model
├─args_parser.py
├─attn.py
├─dataset.py
├─embedding.py
├─layer.py
├─mem_transformer.py
├─positionwiseFF.py
└─vocabulary.py
├─model_utils
├─config.py
├─device_adapter.py
├─local_adapter.py
└─moxing_adapter.py
├─utils
├─additional_algorithms.py
├─dataset_util.py
├─nnUtils.py
└─vocabulary.py
├─default_config.yaml
├─default_config_large.yaml
├─default_config_large_gpu.yaml
├─eval.py
├─hccl_tools.py
└─train.py
以下三种随机情况:
train.py已经设置了一些种子,避免数据集轮换和权重初始化的随机性。若需关闭随机失活,将default_config.yaml中相应的dropout_prob参数设置为0。
请浏览官网主页。
Transformer-XL是对Transformer的改进,主要是解决长序列的问题。
Text
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》