GPU版本和CPU版本的LayerNorm反向传播梯度不一致

您好，我在测试paddle的LayerNorm发现了一个问题

环境：2080Ti，cuda11.2
安装paddlepaddle-gpu（cuda11.2版本）
使用下面的测试代码
```python
import paddle 
import numpy as np 


x = np.array([[[[-1.83965693, -1.82964566]]]]).astype(np.float32)

x_tensor = paddle.to_tensor(x).cpu()
# x_tensor = paddle.to_tensor(x).cuda()

x_tensor.stop_gradient = False

layernorm = paddle.nn.LayerNorm(normalized_shape=(1, 1, 2), epsilon=1e-5)

out = layernorm(x_tensor)
print("Out is: ", out)
out = out.sum()
out.backward()

print("X grad is: ", x_tensor.grad)
```
我这里得到的结果是
```python
Out is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
       [[[[-0.84284753,  0.84282744]]]])

X grad is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
       [[[[-21.41206169,  21.41332817]]]])
```
若安装CPU版本，得到结果是
```python 
Out is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[[[-0.84543723,  0.84541708]]]])
X grad is:  Tensor(shape=[1, 1, 1, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[[[-0.00143835,  0.00143831]]]])
```
经过验证，cpu版本下运算得到的梯度是正确的，感觉是计算存在某些bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU版本和CPU版本的LayerNorm反向传播梯度不一致 #35498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU版本和CPU版本的LayerNorm反向传播梯度不一致 #35498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions