Skip to content

Conversation

@Wong4j
Copy link
Collaborator

@Wong4j Wong4j commented Mar 15, 2024

PR Category

Performance Optimization

PR Types

Performance

Description

Support implicit GEMM algorithm for SubmConv3D.

Usage:

  • nn.functional.subm_conv3d_igemm
# 3D
y = paddle.sparse.nn.functional.subm_conv3d(x, weight, key='key1', padding=[0, 1, 1])   # original
y = paddle.sparse.nn.functional.subm_conv3d_igemm(x, weight, key='key2', padding=[0, 1, 1])   # use implicit gemm

# 2D
y = paddle.sparse.nn.functional.subm_conv2d(x, weight, key='key1', padding=[1, 1])   # original
y = paddle.sparse.nn.functional.subm_conv2d_igemm(x, weight, key='key2', padding=[1, 1])   # use implicit gemm
  • nn.SubmConv3D
# 3D
paddle.nn.SubmConv3D(32, 32, kernel_size=[1, 3, 3], key="key1")  # original
paddle.nn.SubmConv3D(32, 32, kernel_size=[1, 3, 3], key="key2", backend="igemm")  # use implicit gemm

# 2D
paddle.nn.SubmConv2D(32, 32, kernel_size=[3, 3], key="key1")  # original
paddle.nn.SubmConv2D(32, 32, kernel_size=[3, 3], key="key2", backend="igemm")  # use implicit gemm

Perf:
GPU: 3080
Prec: FP16
case: single SubmConv
nnz=214202 dense_shape=[1, 1, 4608, 4608, 32] kernel_size=[1, 3, 3] stride=1 in_channel=out_channel=32
(This perf numbers do not include the overhead of hashmap/rulebook creation, which I assume has been cached.)

--- cutlass igemm
time(us) 703 194

Note:

  • Implicit gemm only supports forward now.
  • I have only verified the correctness for SubmConv, so I'm asserting subm==True and stride==1 and dilation==1 in the code.
  • The cuda kernels are modified based on the torchsparse's implementation.
  • The input must be 3D (NDHWC), and kernel must has dims=3. For 2D case, please insert zeros to to the D dimention of indices and set kernel sizes to (1, 3, 3).
  • The input can be 3D (NDHWC) or 2D (NHWC)

@paddle-bot
Copy link

paddle-bot bot commented Mar 15, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 15, 2024
@Wong4j Wong4j requested a review from Wangzheee March 15, 2024 06:59
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Mar 23, 2024

Sorry to inform you that ee0d63d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@Wong4j Wong4j force-pushed the jaywan/sparse_conv branch 2 times, most recently from 064892d to 10eb558 Compare April 7, 2024 07:49
@Wong4j Wong4j force-pushed the jaywan/sparse_conv branch from 7a5bec7 to dcc9b0b Compare April 9, 2024 05:13
@Wong4j Wong4j changed the title [WIP] [Sparse conv] Implement implicit gemm algo for SubmConv3D [Sparse conv] Implement implicit gemm algo for SubmConv3D Apr 11, 2024
counter->set_dims({1});
}

void Conv3dImplicitGemmInferMeta(const MetaTensor& x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci的代码没有覆盖到这个OP,可以针对这个OP增加单测

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// std::vector<int>* spatial_range;

// destructor
~KmapCache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测中没有执行这个析构,可以增加一下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我增加了单测,本地跑单测会跑到这个析构,但是CI仍然显示没有跑到。

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Apr 17, 2024

Sorry to inform you that 4851677's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续需要更新中文文档

weight_attr=None,
bias_attr=None,
data_format="NDHWC",
backend=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对用户暴露的接口需增加注释

weight_attr=None,
bias_attr=None,
data_format="NDHWC",
backend=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the newly introduced arg backend should be documented in the docstring.

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Wangzheee Wangzheee merged commit 0663608 into PaddlePaddle:develop Apr 22, 2024
co63oc pushed a commit to co63oc/Paddle that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers NVIDIA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants