Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
kimxiaogen c45b2db8c3 | 1 year ago | |
---|---|---|
data | 2 years ago | |
script | 2 years ago | |
src | 1 year ago | |
.gitignore | 1 year ago | |
README.md | 2 years ago | |
README_CN.md | 2 years ago | |
enwik8_base.yaml | 1 year ago | |
eval.py | 1 year ago | |
getdata.sh | 2 years ago | |
hccl.sh | 2 years ago | |
hccl_tools.py | 2 years ago | |
lr_of_40w_steps.npy | 2 years ago | |
rank_table_8pcs.json | 2 years ago | |
requirements.txt | 2 years ago | |
static_lr.py | 1 year ago | |
train.py | 1 year ago |
Transformer-XL is an improvement to Transformer, mainly to solve the problem of long sequences. At the same time, it combines the advantages of RNN sequence modeling and Transformer's self-attention mechanism, introduces a recurrent mechanism and relative position encoding, uses Transformer's attention module on each segment of the input data, and uses a recurrent mechanism to learn the relationship between consecutive segments. dependencies. And successfully achieved SoTA effect on language modeling datasets such as enwik8 and text8.
Paper: Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.
The backbone structure of Transformer-XL is Transformer, which adds Recurrence Mechanism and Relative Positional Encoding on the original basis.
The following four datasets contain the training dataset and the evaluation dataset
The datasets used include enwik8 and text8, with enwik8 containing 100MB of unprocessed Wikipedia text. Similar to enwiki8, text8 also contains 100MB of Wikipedia text, the difference being the removal of characters other than the 26 letters and spaces.
After dataset preparation, you can start training and evaluation as follows:
# run training example
bash scripts/run_enwik8_base.sh train [DEVICE_ID]
# run distributed training example
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]
# run evaluation example
bash scripts/run_enwik8_base.sh eval [DEVICE_ID]
.
└─Transformer-XL
├─README.md
├─scripts
└─run_enwik8_base.sh
├─src
├─callback
├─eval.py
├─flag.py
└─log.py
├─common
└─ops.py
├─loss_fn
└─ProjectedAdaptiveLogSoftmaxLoss.py
├─metric
└─calc.py
├─model
├─attn.py
├─dataset.py
├─embedding.py
├─layer.py
├─mem_transformer.py
├─positionwiseFF.py
└─vocabulary.py
├─model_utils
├─config.py
├─device_adapter.py
├─local_adapter.py
└─moxing_adapter.py
├─utils
├─additional_algorithms.py
├─dataset_util.py
├─nnUtils.py
├─default_config.yaml
├─hccl_tools.py
├─getdata.sh
├─eval.py
└─train.py
usage:
train.py [--ascend] [--data DATA_PATH]
[--dataset NAME] [--optim adam]
options:
--ascend use ascend
--data_path path to dataset file: PATH
--data path to dataset file: enwik8
--optim optimizer, default is adam
Parameters for dataset and network (Training/Evaluation):
n_layer number of total layers: N, default is 12
d_model dimension of model, default is 512
n_head number of heads, default is 8
d_head head dimension, default is 64
d_inner inner dimension in FF, default is 2048
dropout global dropout rate: Q, default is 0.1
dropatt attention probability dropout rate: Q, default is 0.0
max_step maximum of step: N, default is 400000
tgt_len number of tokens to predict, default is 512
mem_len length of the retained previous heads, default is 512
eval_tgt_len number of tokens to predict for evaluation, default is 128
batch_size batch size of input dataset: N, default is 22
Parameters for learning rate:
lr value of learning rate: Q, default is 0.00025
warmup_step steps of the learning rate warm up: N, default is 0
Set options in default_config.yaml
, including loss_scale, learning rate and network hyperparameters.
Run run_enwik8_base.sh
for non-distributed training of Transformer-XL model.
bash scripts/run_enwik8_base.sh train [DEVICE_ID]
Run run_enwik8_base.sh
for distributed training of Transformer-XL model.
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]
Set options in default_config.yaml
. Make sure the 'data' are set to your own path.
Run eval.py
for evaluation of Transformer model.
bash scripts/run_enwik8_base.sh eval [DEVICE_ID]
Parameters | Ascend |
---|---|
Resource | Ascend 910; OS Euler2.8 |
uploaded Date | 18/02/2022 (month/day/year) |
MindSpore Version | 1.6.0 |
Dataset | enwik8 |
Training Parameters | batch_size=22 |
Optimizer | Adam |
Loss Function | Softmax Cross Entropy |
BPC Score | 1.11 |
Speed | 30ms/batch |
Loss | 0.78 |
Params (K) | 15.33 |
Checkpoint for inference | 1.45G(.ckpt文件) |
Scripts | Transformer scripts |
Parameters | Ascend |
---|---|
Resource | Ascend 910; OS Euler2.8 |
Uploaded Date | 18/02/2022 (month/day/year) |
MindSpore Version | 1.6.0 |
Dataset | enwik8 |
batch_size | 22 |
outputs | loss |
Loss | 0.78 |
BPC Score | 1.11 |
There are three random situations:
Some seeds have already been set in train.py to avoid the randomness of dataset shuffle and weight initialization. If you want to disable dropout, please set the corresponding dropout_prob parameter to 0 in default_config.yaml.
Please check the official homepage.
Transformer-XL是对Transformer的改进,主要是解决长序列的问题。
Text
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》