Browse Source

update

pull/12/head
zhangy03 1 month ago
parent
commit
503eac628f
5 changed files with 47 additions and 8 deletions
  1. +2
    -3
      README.md
  2. +2
    -2
      run_pangu_alpha_predict.py
  3. +3
    -3
      run_pangu_alpha_train.py
  4. +22
    -0
      scripts/run_distribute_predict.sh
  5. +18
    -0
      scripts/run_distribute_train.sh

+ 2
- 3
README.md View File

@@ -10,6 +10,7 @@

- **业界首个2000亿参数中文自回归语言模型「盘古α」**
- **代码、模型逐步全开源**
- **首创顺序自回归预训练语言模型ALM**
- **MindSpore超大规模自动并行技术**
- **模型基于国产全栈式软硬件协同生态(MindSpore+CANN+昇腾910+ModelArts)**

@@ -22,7 +23,7 @@
### 模型结构

<img src="./docs/model.png" width="850" height="420"/><br/>
query层堆叠在transformer层之上。query层的基本结构与transformer层相似,只是引入了一个额外的Query layer,来预测生成下一个query Q的位置。模型引入随机词序生成,增加预训练难度,提升模型能力。引入预测模块(Predictor),预训练阶段通过位置向量诱导输出。同时支持理解和生成任务,相比于GPT,盘古α模型设计阶段就考虑了其持续学习演化的能力,一是为了节省计算资源,还支持从顺序自回归模型过渡到随机词序自回归模型的增量训练,不同阶段的持续学习能力让模型具备随机词序的生成,具备更强的NLU能力。
query层堆叠在transformer层之上。query层的基本结构与transformer层相似,只是引入了一个额外的Query layer,来预测生成下一个query Q的位置。

### MindSpore超大规模自动并行

@@ -135,5 +136,3 @@ Generate2:飞云:咳,年轻人说的话要有选择性,我既然说了我
鹏城实验室和北京大学等相关单位是盘古α联合开发团队的主要成员。
<img src="./docs/logos.png" width="266" height="132"/><br/>




+ 2
- 2
run_pangu_alpha_predict.py View File

@@ -73,11 +73,11 @@ if __name__ == "__main__":
default=None,
help="predict file path.")
parser.add_argument('--data_url',
required=True,
required=False,
default=None,
help='Location of data.')
parser.add_argument('--train_url',
required=True,
required=False,
default=None,
help='Location of training outputs.')
parser.add_argument("--run_type",


+ 3
- 3
run_pangu_alpha_train.py View File

@@ -24,11 +24,11 @@ if __name__ == "__main__":
"""train function for PanGu-Alpha"""
parser = argparse.ArgumentParser(description="PanGu training")
parser.add_argument('--train_url',
required=True,
required=False,
default=None,
help='Location of training outputs.')
parser.add_argument('--data_url',
required=True,
required=False,
default="/cache_pangu_alpha/V1-sample60-baike-math-bpe-1024",
help='Location of data.')
parser.add_argument("--distribute",
@@ -125,7 +125,7 @@ if __name__ == "__main__":
help="The run type")
parser.add_argument("--mode",
type=str,
default="200B",
default="2.6B",
choices=["200B", "13B", "2.6B", "self_define"],
help="The train/eval mode")



+ 22
- 0
scripts/run_distribute_predict.sh View File

@@ -0,0 +1,22 @@
#!/bin/bash
execute_path=$(pwd)
script_self=$(readlink -f "$0")
self_path=$(dirname "${script_self}")
export RANK_SIZE=$1
export RANK_TABLE_FILE=$2
export STRATEGY=$3
export TOKENIZER=$4
export CKPT_PATH=$5
export CKPT_NAME=$6
export MODE=$7

for((i=0;i<$RANK_SIZE;i++));
do
rm -rf ${execute_path}/device_$i/
mkdir ${execute_path}/device_$i/
cd ${execute_path}/device_$i/ || exit
export RANK_ID=$i
export DEVICE_ID=$i
python -s ${self_path}/../run_pangu_alpha_predict.py --strategy_load_ckpt_path=$STRATEGY --tokenizer_path=$TOKENIZER --load_ckpt_path=$CKPT_PATH \
--load_ckpt_name=$CKPT_NAME --mode=$MODE >train_deep$i.log 2>&1 &
done

+ 18
- 0
scripts/run_distribute_train.sh View File

@@ -0,0 +1,18 @@
#!/bin/bash

execute_path=$(pwd)
script_self=$(readlink -f "$0")
self_path=$(dirname "${script_self}")
export RANK_SIZE=$1
export DATASET=$2
export RANK_TABLE_FILE=$3
export MODE=$4
for((i=0;i<$RANK_SIZE;i++));
do
rm -rf ${execute_path}/device_$i/
mkdir ${execute_path}/device_$i/
cd ${execute_path}/device_$i/ || exit
export RANK_ID=$i
export DEVICE_ID=$i
python -s ${self_path}/../run_pangu_alpha_train.py --data_url=$DATASET --mode=$MODE >train_deep$i.log 2>&1 &
done

Loading…
Cancel
Save