Deleting a branch is permanent. It CANNOT be undone. Continue?
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》
root@17b3a541b30c:/workspace# /workspace/pangu-alpha-applications/app/chat/scripts/pangu_dialog_tune.sh
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/workspace/pangu-alpha-applications
/workspace/pangu-alpha-applications
using world size: 2 and model-parallel size: 2
using torch.float16 for parameters ...
WARNING: overriding default arguments for tokenizer_type:GPT2BPETokenizer with tokenizer_type:GPT2BPETokenizer
-------------------- arguments --------------------
adlr_autoresume ................. False
adlr_autoresume_interval ........ 1000
apply_query_key_layer_scaling ... False
apply_residual_connection_post_layernorm False
attention_dropout ............... 0.1
attention_softmax_in_fp32 ....... False
batch_size ...................... 64
bert_load ....................... None
bias_dropout_fusion ............. False
bias_gelu_fusion ................ False
block_data_path ................. None
checkpoint_activations .......... True
checkpoint_num_layers ........... 1
clip_grad ....................... 1.0
data_impl ....................... mmap
data_path ....................... /workspace/finetune/data/
DDP_impl ........................ torch
distribute_checkpointed_activations True
distributed_backend ............. nccl
dynamic_loss_scale .............. True
eod_mask_loss ................... False
eval_interval ................... 500
eval_iters ...................... 200
exit_interval ................... None
faiss_use_gpu ................... False
finetune ........................ True
fp16 ............................ True
fp16_lm_cross_entropy ........... True
fp32_allreduce .................. False
gradient_accumulation_steps ..... 1
hidden_dropout .................. 0.1
hidden_size ..................... 2560
hysteresis ...................... 2
ict_head_size ................... None
ict_load ........................ None
indexer_batch_size .............. 128
indexer_log_interval ............ 1000
init_method_std ................. 0.02
layernorm_epsilon ............... 1e-05
lazy_mpu_init ................... None
load ............................ /workspace/pangu_dialog_fp16_2b6/
local_rank ...................... 0
log_interval .................... 100
loss_scale ...................... None
loss_scale_window ............... 1000
lr .............................. 5e-05
lr_decay_iters .................. 1000
lr_decay_style .................. cosine
make_vocab_size_divisible_by .... 1
mask_prob ....................... 0.15
max_position_embeddings ......... 1024
merge_file ...................... gpt2-merges.txt
min_lr .......................... 1e-06
min_scale ....................... 1
mmap_warmup ..................... False
model_parallel_size ............. 2
no_load_optim ................... False
no_load_rng ..................... True
no_save_optim ................... False
no_save_rng ..................... False
num_attention_heads ............. 32
num_layers ...................... 31
num_unique_layers ............... None
num_workers ..................... 2
onnx_safe ....................... None
openai_gelu ..................... False
override_lr_scheduler ........... False
param_sharing_style ............. grouped
params_dtype .................... torch.float16
query_in_block_prob ............. 0.1
rank ............................ 0
report_topk_accuracies .......... []
reset_attention_mask ............ False
reset_position_ids .............. False
save ............................ /workspace/finetune/model/pangu_dialog_fp16_2b6_new/
save_interval ................... 10000
scaled_upper_triang_masked_softmax_fusion False
seed ............................ 1234
seq_length ...................... 1024
short_seq_prob .................. 0.1
split ........................... 949,50,1
tensorboard_dir ................. None
titles_data_path ................ None
tokenizer_type .................. GPT2BPETokenizer
train_iters ..................... 50000
use_checkpoint_lr_scheduler ..... False
use_cpu_initialization .......... True
use_one_sent_docs ............... False
vocab_file ...................... /workspace/pangu-alpha-applications/megatron/bpe_4w_pcl/vocab
warmup .......................... 0.01
weight_decay .................... 0.01
world_size ...................... 2
---------------- end of arguments ----------------
好像是vocab文件的问题
size mismatch for weight: copying a param with shape torch.Size([40000, 2560]) from checkpoint, the shape in current model is torch.Size([20000, 2560]).
非常感谢盘古ALPHA微信群内的大佬解答。
实验环境:2* RTX 3090
内存:64GB
使用GPUS_PER_NODE=1 NNODES=1 单卡情况下可以跑通这一步,多卡大佬的解答为:
"如果你做了模型并行的话,先把模型做一下拆分,再加载"