Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
谢泽丰 2333dae294 | 2 years ago | |
---|---|---|
.idea | 2 years ago | |
script | 2 years ago | |
src | 2 years ago | |
README.md | 2 years ago | |
README_CN.md | 2 years ago | |
default_config.yaml | 2 years ago | |
default_eval_config.yaml | 2 years ago | |
eval.py | 2 years ago | |
getdata.sh | 2 years ago | |
requirements.txt | 2 years ago | |
train.py | 2 years ago |
Transformer-XL is an improvement to Transformer, mainly to solve the problem of long sequences. At the same time, it combines the advantages of RNN sequence modeling and Transformer's self-attention mechanism, introduces a recurrent mechanism and relative position encoding, uses Transformer's attention module on each segment of the input data, and uses a recurrent mechanism to learn the relationship between consecutive segments. dependencies. And successfully achieved SoTA effect on language modeling datasets such as enwik8 and text8.
Paper: Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.
The backbone structure of Transformer-XL is Transformer, which adds Recurrence Mechanism and Relative Positional Encoding on the original basis.
The following four datasets contain the training dataset and the evaluation dataset
After dataset preparation, you can start training and evaluation as follows:
# run training example
bash scripts/run_enwik8_base.sh train [DEVICE_ID]
# run distributed training example
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]
# run evaluation example
bash scripts/run_enwik8_base.sh eval [DEVICE_ID]
.
└─Transformer-XL
├─README.md
├─scripts
└─run_enwik8_base.sh
├─src
├─callback
├─eval.py
├─flag.py
└─log.py
├─common
└─ops.py
├─loss_fn
└─ProjectedAdaptiveLogSoftmaxLoss.py
├─metric
└─calc.py
├─model
├─attn.py
├─dataset.py
├─embedding.py
├─layer.py
├─mem_transformer.py
├─positionwiseFF.py
└─vocabulary.py
├─model_utils
├─config.py
├─device_adapter.py
├─local_adapter.py
└─moxing_adapter.py
├─utils
├─additional_algorithms.py
├─dataset_util.py
├─nnUtils.py
├─default_config.yaml
├─hccl_tools.py
├─getdata.sh
├─eval.py
└─train.py
usage:
train.py [--ascend] [--data DATA_PATH]
[--dataset NAME] [--optim adam]
options:
--ascend use ascend
--data_path path to dataset file: PATH, default is ""
--data path to dataset file: PATH, default is ""
--optim optimizer, default is adam
Parameters for dataset and network (Training/Evaluation):
n_layer number of hidden layers: N, default is 12
d_model dimension of model
n_head todo
d_head dimension of word
d_inner dimension of inner net
dropout dropout probability for TransformerOutput: Q, default is 0.1
dropatt dropout probability for TransformerAttention: Q, default is 0.0
max_step maximum of step: N
tgt_len length of target
mem_len length of memorize
eval_tgt_len length of target while doing evaluation
batch_size batch size of input dataset: N, default is 22
Parameters for learning rate:
lr value of learning rate: Q
warmup_step steps of the learning rate warm up: N
Set options in default_config.yaml
, including loss_scale, learning rate and network hyperparameters. Click here for more information about dataset.
Run run_enwik8_base.sh
for non-distributed training of Transformer-XL model.
bash scripts/run_enwik8_base.sh train [DEVICE_ID]
Run run_enwik8_base.sh
for distributed training of Transformer-XL model.
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]
Set options in default_config.yaml
. Make sure the 'data' are set to your own path.
Run eval.py
for evaluation of Transformer model.
bash scripts/run_enwik8_base.sh eval [DEVICE_ID]
Parameters | Ascend |
---|---|
Resource | Ascend 910; OS Euler2.8 |
uploaded Date | 18/02/2022 (month/day/year) |
MindSpore Version | 1.6.0 |
Dataset | enwik8 |
Training Parameters | batch_size=22 |
Optimizer | Adam |
Loss Function | Softmax Cross Entropy |
BPC Score | 4.90 |
Speed | 30ms/batch |
Loss | 4.14 |
Params (K) | 15.33 |
Checkpoint for inference | 1.45G(.ckpt文件) |
Scripts | Transformer scripts |
Parameters | Ascend |
---|---|
Resource | Ascend 910; OS Euler2.8 |
Uploaded Date | 18/02/2022 (month/day/year) |
MindSpore Version | 1.6.0 |
Dataset | enwik8 |
batch_size | 22 |
outputs | loss |
Loss | 4.14 |
There are three random situations:
Some seeds have already been set in train.py to avoid the randomness of dataset shuffle and weight initialization. If you want to disable dropout, please set the corresponding dropout_prob parameter to 0 in default_config.yaml.
Please check the official homepage.
Transformer-XL是对Transformer的改进,主要是解决长序列的问题。
Python Markdown Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》