Skip to content

Conversation

@Baibaifan
Copy link
Contributor

@Baibaifan Baibaifan commented Jun 22, 2021

PR types

Bug fixes

PR changes

OPs

Describe

1.repair npu matmulv2_grad supported 3*3->2 and add the UT test.
2.repair npu comm_init_hccl op by adding to send fake data to build connection.

matmul_gradv2 precision npu and gpu in fp16 for 5 epochs.

npu:
3d13fec7496c714cabf60795d02f85ca

gpu:
f101dcf652d8cffb0b8c9d3aff500daa

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Baibaifan Baibaifan closed this Jun 22, 2021
@Baibaifan Baibaifan reopened this Jun 22, 2021
NpuOpRunner("BatchMatMul", {*x, *dout}, {*dy},
{{"adj_x1", true}, {"adj_x2", false}});
runner_dy.Run(stream);
if ((x->dims().size() == 3) && (dout->dims().size() == 3) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x dims为3,y dims为2的情况,前向是不是也不能用BatchMatMul

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前向可以,这里做了纬度判断是因为输出是个2纬,但是输入是两个3纬需要转化下

Copy link
Contributor

@pangyoki pangyoki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matmul v2 op fp16类型可能存在输入 3维 乘 2维的情况。BatchMatMul NPU op的fp32类型不支持这种情况。
目前情况下不会使用fp32数据类型,输入 3维 乘 2维的情况。所以暂时没对fp32做支持。
后续需要添加fp32类型对这种情况的处理。

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for unittest.skipIf


// Build comm
float* buff;
int32_t size = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥是20?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仅用于初始化

for (int32_t idx = 0; idx < size; idx++) {
input[idx] = 1.0;
}
aclrtMalloc(reinterpret_cast<void**>(&buff), size * sizeof(float),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种函数需要确保成功吧,得拿ACLCHECK包一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

记录,下一个pr优化进去

@gongweibao gongweibao merged commit 9bf00cd into PaddlePaddle:develop Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants