#10 2.6B模型的output.mapping.weight 的 input_size和源码的不一致

Closed
created 3 years ago by mymusise · 1 comments
mymusise commented 3 years ago
你好!请教个问题。 当我加载每个`*.ckpt`文件的时候发现, 每个`Mapping_output.weight.shape` = (40, 1280)有点奇怪 ``` ... Parameter (name=backbone.blocks.30.attention.dense3.weight, shape=(5, 2560), dtype=Float32, requires_grad=True), Parameter (name=backbone.blocks.30.attention.dense3.bias, shape=(5,), dtype=Float32, requires_grad=True), Parameter (name=backbone.blocks.30.output.mapping.weight, shape=(40, 1280), dtype=Float32, requires_grad=True), Parameter (name=backbone.blocks.30.output.mapping.bias, shape=(20,), dtype=Float32, requires_grad=True) ... ``` 按道理是不是应该是(20, 2560)? 想请问下模型参数拆分成512份的时候是怎么拆分的?
taoht commented 2 years ago
Owner
512卡:mp=8, dp=512/8=64 backbone.blocks.30.output.mapping.weight参数未拆分应该是(2560, 1280) 64路数据并行,拆分后为(2560/64=40, 1280),即(40, 1280)
taoht closed this issue 2 years ago
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.