#19 【大模型应用场景验证】数据独立非同分布场景下2.6B大模型协同训练任务性能对比

Open
created 1 year ago by taoht · 1 comments
taoht commented 1 year ago
![image](/attachments/06382d63-e20e-4cbd-9196-33acbf24b2b3) ## 场景介绍 ### 【任务描述】探索具体场景任务在两方数据独立非同分布场景下协同训练的有效性 server节点:鹏城云脑1 client节点:鹏城云脑2(单节点8卡/16卡...)*2 数据集 train集:1、CMRC2017 train set; 2、PD train set test集: 1、CMRC2017 test set; 2、PD test set 模型 PanGu Alpha 2.6B, mp=8, dp=device_num/mp, 协同场景采用8片分通道融合 基于预训练PanGu Alpha模型继续微调训练模式
taoht commented 1 year ago
Owner
## standalone 独立训练 & AIsyn 协同 & 数据融合独立训练,三方对比 ### 独立训练 | PanGu 2.6B |device |epoch | steps| time| Loss |指标| cmrc2017 | PD | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | without ft |-|- | - |-|-| Acc | 8.44 % | 18.08 % | | cmrc2017 ft |NPU*8|1 | 0.4 w |0.4w*4.83s = 5.37 h|0.747| Acc | 65.26 % | 58.25 % | | cmrc2017 ft |NPU*8|3 | 1.1 w |1.1w*4.83s = 14.76 h|0.689| Acc | 66.62 % | 57.57 % | | PD ft |NPU*8|1 |1.1 w |1.1w*4.81s = 14.70 h| 1.087 | Acc | 52.35 % | 71.46 % | | PD ft |NPU*8|3 |4.0 w |4.0w*4.81s = 53.44 h| 0.763| Acc | 51.42 % | 71.26 % | ### 数据融合独立训练 | PanGu 2.6B |device |epoch | steps| time| Loss |指标| cmrc2017 | PD | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | cmrc+PD ft |NPU*8 |0.5 |1.1 w |1.1w*4.87s = 14.88 h| 1.257| Acc| 68.09 % | 71.50 % | | cmrc+PD ft |NPU*8 |0.82 |1.8 w |1.8w*4.87s = 24.35 h| 1.1| Acc| 68.59 % | 71.63 % | | cmrc+PD ft |NPU*8 |1 |2.4 w |2.4w*4.87s = 32.47 h| 1.06| Acc| 68.69 % | 72.00 % | ### AIsyn 协同训练 | PanGu 2.6B |device |round | steps| time| Loss |指标| cmrc2017 | PD | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | AIsyn ft |NPU*8 |5 |1.1 w |0.55w*4.91s = 7.5 +0.618x5 =10.59 h| -| Acc| 37.65 % | 55.60 % | | AIsyn ft |NPU*8 |10 |2.2 w |1.1w*4.91s = 15 +0.618x10 =21.18 h| -| Acc| 37.15 % |41.45 % | | AIsyn ft |NPU*8 |30 |6.6 w |3.3w*4.91s = 45 +0.618x30 =63.54 h| 0.70| Acc| 30.34 % |47.29 % | | client1 ft |NPU*8 |10 |2.2 w |1.1w*4.91s = 15 +0.618x9 =20.56 h| 1.07| Acc|35.38 % |52.94 % | | client2 ft |NPU*8 |10 |2.2 w |1.1w*4.91s = 15 +0.618x9 =20.56 h| 1.93| Acc|41.81 % |42.39 % | | client1 ft |NPU*8 |27 |5.94 w |2.97w*4.91s = 40.51 +0.618x26 =56.58 h| -| Acc|38.08 % |46.39 % | **CMRC2017 finetune & PD finetune** ![image](/attachments/bf8406d0-dc15-4cdb-9b26-72a8c9b5966b) **AIsyn 协同 node1(cmrc2017)** ![image](/attachments/ca685076-0e3e-43b1-bff4-50d0667250c0) **AIsyn 协同 node2(pd)** ![image](/attachments/8a3bebc1-7cc2-4cad-b1b5-e09da84c5c7c)
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.