谢泽丰 2333dae294 【fix】更新为最新的Transformer-XL代码（0322版）		2 years ago
.idea	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

script	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

src	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

README.md	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

README_CN.md	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

default_config.yaml	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

default_eval_config.yaml	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

eval.py	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

getdata.sh	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

requirements.txt	[Transformer-XL]-commit-1.0	2 years ago

train.py	【fix】更新为最新的Transformer-XL代码（0322版）	2 years ago

README.md

Transformer-XL is an improvement to Transformer, mainly to solve the problem of long sequences. At the same time, it combines the advantages of RNN sequence modeling and Transformer's self-attention mechanism, introduces a recurrent mechanism and relative position encoding, uses Transformer's attention module on each segment of the input data, and uses a recurrent mechanism to learn the relationship between consecutive segments. dependencies. And successfully achieved SoTA effect on language modeling datasets such as enwik8 and text8.

Paper: Dai Z, Yang Z, Yang Y, et al. Transformer-xl: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.

Model Architecture

The backbone structure of Transformer-XL is Transformer, which adds Recurrence Mechanism and Relative Positional Encoding on the original basis.

Dataset

The following four datasets contain the training dataset and the evaluation dataset

enwik8
WikiText-10
text8
One Billion Word

Environment Requirements

Hardware（Ascend/GPU）
- Prepare hardware environment with Ascend or GPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

Running on GPU

After dataset preparation, you can start training and evaluation as follows:

# run training example
bash scripts/run_enwik8_base.sh train [DEVICE_ID]

# run distributed training example
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]

# run evaluation example
bash scripts/run_enwik8_base.sh eval  [DEVICE_ID]

Script Description

Script and Sample Code

.
└─Transformer-XL
  ├─README.md
  ├─scripts
    └─run_enwik8_base.sh
  ├─src
    ├─callback
      ├─eval.py
      ├─flag.py
      └─log.py
    ├─common
      └─ops.py
    ├─loss_fn
      └─ProjectedAdaptiveLogSoftmaxLoss.py
    ├─metric
      └─calc.py
    ├─model
      ├─attn.py
      ├─dataset.py
      ├─embedding.py
      ├─layer.py
      ├─mem_transformer.py
      ├─positionwiseFF.py
      └─vocabulary.py
    ├─model_utils
      ├─config.py
      ├─device_adapter.py
      ├─local_adapter.py
      └─moxing_adapter.py
    ├─utils
      ├─additional_algorithms.py
      ├─dataset_util.py
      ├─nnUtils.py
  ├─default_config.yaml
  ├─hccl_tools.py
  ├─getdata.sh
  ├─eval.py
  └─train.py

Script Parameters

Training Script Parameters

usage:
train.py [--ascend] [--data DATA_PATH]
         [--dataset NAME] [--optim adam]
options:
    --ascend     use ascend
    --data_path  path to dataset file: PATH, default is ""
    --data       path to dataset file: PATH, default is ""
    --optim      optimizer, default is adam

Network Parameters

Parameters for dataset and network (Training/Evaluation):
    n_layer       number of hidden layers: N, default is 12
    d_model       dimension of model
    n_head        todo
    d_head        dimension of word
    d_inner       dimension of inner net
    dropout       dropout probability for TransformerOutput: Q, default is 0.1
    dropatt       dropout probability for TransformerAttention: Q, default is 0.0
    max_step      maximum of step: N
    tgt_len       length of target
    mem_len       length of memorize
    eval_tgt_len  length of target while doing evaluation
    batch_size    batch size of input dataset: N, default is 22

Parameters for learning rate:
    lr            value of learning rate: Q
    warmup_step   steps of the learning rate warm up: N

Dataset Preparation

Download the dataset and configure DATA_PATH

Training Process

Set options in default_config.yaml, including loss_scale, learning rate and network hyperparameters. Click here for more information about dataset.
Run run_enwik8_base.sh for non-distributed training of Transformer-XL model.
```
bash scripts/run_enwik8_base.sh train [DEVICE_ID]
```
Run run_enwik8_base.sh for distributed training of Transformer-XL model.
```
bash scripts/run_enwik8_base.sh train [DEVICE_NUM]
```

Evaluation Process

Set options in default_config.yaml. Make sure the 'data' are set to your own path.

Run eval.py for evaluation of Transformer model.

bash scripts/run_enwik8_base.sh eval [DEVICE_ID]

Model Description

Performance

Training Performance

Parameters	Ascend
Resource	Ascend 910; OS Euler2.8
uploaded Date	18/02/2022 (month/day/year)
MindSpore Version	1.6.0
Dataset	enwik8
Training Parameters	batch_size=22
Optimizer	Adam
Loss Function	Softmax Cross Entropy
BPC Score	4.90
Speed	30ms/batch
Loss	4.14
Params (K)	15.33
Checkpoint for inference	1.45G(.ckpt文件)
Scripts	Transformer scripts

Evaluation Performance

Parameters	Ascend
Resource	Ascend 910; OS Euler2.8
Uploaded Date	18/02/2022 (month/day/year)
MindSpore Version	1.6.0
Dataset	enwik8
batch_size	22
outputs	loss
Loss	4.14

Description of Random Situation

There are three random situations:

Shuffle of the dataset.
Initialization of some model weights.
Dropout operations.

Some seeds have already been set in train.py to avoid the randomness of dataset shuffle and weight initialization. If you want to disable dropout, please set the corresponding dropout_prob parameter to 0 in default_config.yaml.

ModelZoo Homepage

Please check the official homepage.

Transformer-XL是对Transformer的改进，主要是解决长序列的问题。

mindspore

Python Markdown Shell

How to access data resources in code

README.md

Contents

Transformer_XL Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Script Parameters

Network Parameters

Dataset Preparation

Training Process

Evaluation Process

Model Description

Performance

Training Performance

Evaluation Performance

Description of Random Situation

ModelZoo Homepage

Contributors (1)
All

README.md

Contents

Training Script Parameters

Network Parameters

Training Performance

Evaluation Performance

Contributors (1) All

Contributors (1)
All