Skip to content

GPU版本和CPU版本的LayerNorm反向传播梯度不一致 #35498

@MARD1NO

Description

@MARD1NO

您好,我在测试paddle的LayerNorm发现了一个问题

环境:2080Ti,cuda11.2
安装paddlepaddle-gpu(cuda11.2版本)
使用下面的测试代码

import paddle 
import numpy as np 


x = np.array([[[[-1.83965693, -1.82964566]]]]).astype(np.float32)

x_tensor = paddle.to_tensor(x).cpu()
# x_tensor = paddle.to_tensor(x).cuda()

x_tensor.stop_gradient = False

layernorm = paddle.nn.LayerNorm(normalized_shape=(1, 1, 2), epsilon=1e-5)

out = layernorm(x_tensor)
print("Out is: ", out)
out = out.sum()
out.backward()

print("X grad is: ", x_tensor.grad)

我这里得到的结果是

Out is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
       [[[[-0.84284753,  0.84282744]]]])

X grad is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
       [[[[-21.41206169,  21.41332817]]]])

若安装CPU版本,得到结果是

Out is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[[[-0.84543723,  0.84541708]]]])
X grad is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[[[-0.00143835,  0.00143831]]]])

经过验证,cpu版本下运算得到的梯度是正确的,感觉是计算存在某些bug

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions