#890 adapter to new autograd

Merged
Erpim merged 45 commits from frelam/MSAdapter:master0311 into master 1 month ago
frelam commented 1 month ago
Erpim reviewed 1 month ago
testing/ut/pytorch/optim/test_optim.py
@@ -6,3 +6,3 @@

from ...utils import set_mode_by_env_config, SKIP_ENV_GRAPH_MODE, SKIP_ENV_PYNATIVE_MODE
set_mode_by_env_config()
set_mode_by_env_config(), enable_backward
Erpim commented 1 month ago
, enable_backward 跟在import后面
frelam commented 1 month ago
done
frelam changed title from [WIP]adapter to new autograd to adapter to new autograd 1 month ago
Erpim reviewed 1 month ago
@@ -41,2 +47,3 @@
self.modules_buffers.append(buffer)
self.broadcast = ops.Broadcast(0, pg_name)

self.broadcast_bucket_size = int(250 * 1024 * 1024)
Erpim commented 1 month ago
该数据是根据什么计算出来的?和环境硬件相关?还是业界有统一的默认值?
frelam commented 1 month ago
业界有统一的默认值. pytorch也是这么取默认值。
frelam commented 1 month ago
当前bucket功能还没有起作用, 这个参数没有起到实际作用。 bucket通信需要后面再研究下。
Erpim reviewed 1 month ago
mindtorch/torch/optim/optimizer.py
@@ -258,0 +260,4 @@
raise ValueError("Under GraphMode, has to pass `grads` to optimizer.step.")
grads = [param.grad if param.grad is not None
else ms.ops.zeros_like(param) for param in self.parameters]
grads = tuple(grads)
Erpim commented 1 month ago
加个注释说明以下,避免重复编译问题,防止后续误删
frelam commented 1 month ago
done
Erpim reviewed 1 month ago
testing/ut/pytorch/optim/test_optim.py
@@ -88,6 +88,209 @@ def test_sgd():
torch_result2 = opt.param_groups[0]['params'][0].detach().numpy()
assert np.allclose(ms_result2, torch_result2)

@SKIP_ENV_GRAPH_MODE("backward not support graph mode.")
Erpim commented 1 month ago
如果测试backward, 用例加上with enable_backward(),如果不涉及 backward图模式是不是可以支持?
frelam commented 1 month ago
优化器里面对图模式做了拦截。 如果用户没有传入grad, 会要求用户在pynative下使用。
Erpim commented 1 month ago
这个拦截是否必须?AscendSpeed 给grad赋值的方式,是不是也可以入图?
frelam commented 1 month ago
嗯, 去掉该拦截。
Erpim reviewed 1 month ago
Erpim reviewed 1 month ago
testing/ut/pytorch/amp/test_clip_grad.py
@@ -56,0 +64,4 @@
for p, g in zip(l.parameters(), ms_grads):
p.grad = g
ms_total_norm = ms_torch.nn.utils.clip_grad_norm_(l.parameters(), max_norm,
norm_type=norm_type)
Erpim commented 1 month ago
缩进格式
frelam commented 1 month ago
done
frelam reviewed 1 month ago
@@ -64,4 +71,3 @@
def scale(self, outputs):
if not self._enabled:
return outputs
return DynamicLossScaler.scale(self, outputs)
frelam commented 1 month ago
mindspore的DynamicLossScaler.scale中的实现, 使用了jit。 但是新微分, 在jit下不能使用。 所以需要重新实现一个不带jit的版本。
frelam reviewed 1 month ago
@@ -79,3 +103,2 @@

optimizer_state['found_inf_per_device'] = self._check_inf(grads)
if graph_mode_condition():
raise RuntimeError("Under graph mode, GradScalar not support unscale_(), please use unscale(). "
frelam commented 1 month ago
报错前移
frelam reviewed 1 month ago
@@ -43,0 +50,4 @@
# TODO: not support 'parameters_to_ignore' now, because it is used by 'delay_all_reduce_named_params',
# but 'delay_all_reduce_named_params' relies on Parameter's hook, which is not support yet.
self.parameters_to_ignore = set()
_sync_module_states(
frelam commented 1 month ago
同pytorch,新增多卡上buffer和 param的同步。 已在resnet50上验证, 最终精度可提高, 90个epoch后接近pytorch精度。
frelam reviewed 1 month ago
@@ -63,3 +80,3 @@
if self.will_sync_module_buffers():
self._sync_buffers()
return self.network(*inputs, **kwargs)
return self.module(*inputs, **kwargs)
frelam commented 1 month ago
同pytorch,修改成员变量名称
frelam reviewed 1 month ago
@@ -93,15 +114,15 @@ class GradScaler(DynamicLossScaler):
optimizer_state["stage"] = OptState.UNSCALED
return DynamicLossScaler.unscale(self, grads)

def _maybe_opt_step(self, optimizer, grads, optimizer_state, *args, **kwargs):
frelam commented 1 month ago
内部接口, grads后移, 通过args做判断。 移动后, 同pytorch定义
zoulq reviewed 1 month ago
@@ -138,2 +159,3 @@
if optimizer_state["stage"] is OptState.READY:
self.unscale_(optimizer, grads)
# To see if grads is pass in.
if len(args) > 0 and isinstance(args[0], tuple) and \
zoulq commented 1 month ago
用原来grads参数会有什么问题
frelam commented 1 month ago
如果用法是step(optimizer, grad), 没有影响, 能够兼容, 这里会识别有没有传入grad。 如果用法是step(optimizer, grads=grad), 改后会出现错误。
zoulq reviewed 1 month ago
@@ -1042,0 +1042,4 @@
return

for p in self.parameters():
if p.grad is not None:
zoulq commented 1 month ago
如果不用新微分方案,用户自己挂上grad场景,下面会不会报错?
frelam commented 1 month ago
不会, 这个场景在“test_sgd_step_no_grads”这个用例中测试了
zoulq reviewed 1 month ago
mindtorch/torch/nn/utils/clip_grad.py
@@ -51,3 +52,3 @@
return new_grads, cast_to_adapter_tensor(total_norm)

def clip_grad_norm_(parameters, max_norm, grads, norm_type=2.0, error_if_nonfinite=False, foreach=None):
def clip_grad_norm_(parameters, max_norm, grads=None, norm_type=2.0,
zoulq commented 1 month ago
如果之前没有人使用,grads参数可以加在最后更合适
frelam commented 1 month ago
好的。 看了下modelzoo也暂未去适配这个接口 。 连同下面clip_grad_value_也将grads移动到最后面。
Erpim commented 1 month ago
确认下是否涉及资料变更?
frelam commented 1 month ago
库上当前没有涉及grads位置的相关资料和代码。
zoulq commented 1 month ago
Collaborator
#891 pr屏蔽的用例要打开
frelam commented 1 month ago
Poster
> #891 pr屏蔽的用例要打开 有些用例还依赖于更新ci版本。 后面yepeng会统一提pr打开用例。
Erpim referenced this issue from a commit 1 month ago
adapter to new autograd (#890) update clip grad grads position add zero grad test in test_optim Merge remote-tracking branch 'upstream/master' into master0311 remove graph condition of optimizer.step update testcase update comment of optimizer.step skip autograd testcase under graph mode uncomment grad scaler testcase update testcase Merge remote-tracking branch 'upstream/master' into master0311 update stream testcase to use less stream for memory saving updatet test_optim Merge remote-tracking branch 'upstream/master' into master0311 Merge remote-tracking branch 'refs/remotes/upstream/master' into master0311 update gradscaler grad list to tuple update optimizer.step gradient from list to tuple update testcast with enable_backawrd Merge remote-tracking branch 'upstream/master' into master0311 update gradscalar.scale to remove ms.jit update optim testcast double backward update grad scale update revert grads=grads revert optimizer grads=grads update grad scalar fix clip grad fix grad scalel update clip norm and value fix ddp pylint Merge remote-tracking branch 'upstream/master' into master0311 add parameter sync for ddp update clip grad testcase update testcase update change ddp update ddp and testcase fix pylint add module.zero_grad and optimizer.zero_grad fix gradscaler and clipgrad update lr_scheduler testcase adapte clip grad update grad scaler testcase update gradscalar update test_optim testcase add optimizer to new autograd Co-authored-by: lvhaoyu <lvhaoyu@huawei.com> Reviewed-on: https://openi.pcl.ac.cn/OpenI/MSAdapter/pulls/890
Erpim merged commit 5ab0904670 into master 1 month ago
The pull request has been merged as 5ab0904670.
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This pull request currently doesn't have any dependencies.

Loading…
There is no content yet.