b8dd4e9e9d update predict
d4dfbf85a3 update predict
- Compare 2 commits »
taoht closed issue PCL-Platform.Inte.../AISynergy#18
【bug】:2.6B NPU两节点协同训练"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":321,"referenced_errors":1 week ago
taoht commented on issue PCL-Platform.Inte.../AISynergy#19
【大模型应用场景验证】数据独立非同分布场景下2.6B大模型协同训练任务性能对比## standalone 本地独立训练结果评测 ### 1、PanGu Alpha 2.6B在CMRC2017数据集微调训练, 3个epoch | PanGu 2.6B 模型 | 评测指标| cmrc2017 | PD | | -------- | -------- | -------- | -------- | | cmrc2017 finetune | Acc | 66.62 % | 57.57 % |
1 week ago
taoht opened issue PCL-Platform.Inte.../AISynergy#19
【大模型应用场景验证】数据独立非同分布场景下2.6B大模型协同训练任务性能对比1 week ago
taoht commented on issue PCL-Platform.Inte.../AISynergy#18
【bug】:2.6B NPU两节点协同训练"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":321,"referenced_errors":### 【bug解决及对比测试】-->已解决-->GRPC底层原生bug AIsyncore编译中强制使用grpc==1.46.0版本: 1、从新编译安装Aisyncore(grpcio==1.46.0) 2、启动2NPU节点的协同训练 * 如下图可看到训练能正常进行 
2 weeks ago
taoht commented on issue PCL-Platform.Inte.../AISynergy#18
【bug】:2.6B NPU两节点协同训练"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":321,"referenced_errors":AIsyncore/flwr的最新版本均要求grpcio<=1.43.0,没办法测试grpc>=1.46.0版本是否修复/规避了该bug的发生
2 weeks ago
taoht opened issue PCL-Platform.Inte.../AISynergy#18
【bug】:2.6B NPU两节点协同训练"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":321,"referenced_errors":2 weeks ago
taoht closed issue PCL-Platform.Inte.../AISynergy#16
Bug:UnboundLocalError: local variable 'compression' refenenced before assignment3 weeks ago
taoht commented on issue PCL-Platform.Inte.../AISynergy#16
Bug:UnboundLocalError: local variable 'compression' refenenced before assignmentmaster分支没有问题,用户使用时不要使用其他开发分支版本
3 weeks ago
taoht opened issue PCL-Platform.Inte.../AISynergy#16
Bug:UnboundLocalError: local variable 'compression' refenenced before assignment3 weeks ago
taoht closed issue PCL-Platform.Inte.../mPanGu-Alpha-53#1
在docker 容器中使用GPU推理会报错 AttributeError: The 'VocabEmbedding' object has no attribute 'compile_cache'4 weeks ago
taoht commented on issue PCL-Platform.Inte.../mPanGu-Alpha-53#1
在docker 容器中使用GPU推理会报错 AttributeError: The 'VocabEmbedding' object has no attribute 'compile_cache'关注、点赞,一键三连 有任何问题,欢迎随时来问,看到有环境条件会及时排查和解答,多谢理解。
4 weeks ago
taoht pushed to master at PCL-Platform.Inte.../mPanGu-Alpha-53
4 weeks ago
taoht commented on issue PCL-Platform.Inte.../mPanGu-Alpha-53#1
在docker 容器中使用GPU推理会报错 AttributeError: The 'VocabEmbedding' object has no attribute 'compile_cache'**实测推理时用1张卡就行(推荐单卡推理),效率上差不多,修改配置:args_opt.distribute == "false" 或者启动命令如下:** ``` mpirun --allow-run-as-root \ -x PATH \ -x LD_LIBRARY_PATH \ -x PYTHONPATH \ -x NCCL_DEBUG \ -x GLOG_v \ -n 1 \ --hostfile hostfile_1gpus \ --output-filename log_output \ --merge-stderr-to-stdout \ python -s /path/to/predict.py \ --mode 2.6B \ --run_type predict \ **--distribute false** \ --language_idx $LANGUAGE_IDX \ --op_level_model_parallel_num 1 \ --load_ckpt_path /path/to/ckpt_path/ \ --load_ckpt_name /ckpt_name \ --param_init_type "fp16" ``` 如果使用多张卡分布式推理:predict.py第110行增加load_ckpt_path配置(运行策略文件): ``` config = PanguAlphaConfig( load_ckpt_path="/path/to/ckpt_strategy_exp4.ckpt" ) ```
4 weeks ago
taoht pushed to master at PCL-Platform.Inte.../mPanGu-Alpha-53
4 weeks ago
taoht commented on issue PCL-Platform.Inte.../mPanGu-Alpha-53#1
在docker 容器中使用GPU推理会报错 AttributeError: The 'VocabEmbedding' object has no attribute 'compile_cache'可以先把predict.py文件如图所示修改试试看,不行把完整日志文件粘贴上来,我好帮排查,谢谢。 
4 weeks ago
taoht upload dataset wiki_mindrecord_test_gpu.zip
1 month ago