Skip to content

Conversation

@minghaoBD
Copy link

PR types

Performance optimization

PR changes

OPs

Describe

Made several optimizations in this PR:

  1. Reduce the runtime memory consumptions by introducing shared_ptr in spmm_plugin.cu.
  2. Support FP16 inference by changing the bias from FP32 to FP16 while in fp16-mode inference.
  3. Apply the spmm_plugin into the fused multihead_matmul op to further enhance the inference performance.
  4. Add a UT accordingly.

nv_library(tensorrt_op_teller SRCS op_teller.cc DEPS framework_proto device_context boost)
nv_test(test_tensorrt SRCS test_tensorrt.cc DEPS dynload_cuda device_context dynamic_loader)
nv_test(test_tensorrt_engine SRCS test_engine.cc DEPS dynload_cuda tensorrt_engine)
nv_test(test_tensorrt_engine SRCS test_engine.cc test_dynamic_engine DEPS dynload_cuda tensorrt_engine tensorrt_plugin)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

少了.cc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

pass_library(gpu_cpu_map_matmul_to_mul_pass inference)
pass_library(mixed_precision_configure_pass inference)
pass_library(replace_dense_with_sparse_pass inference)
pass_library(replace_dense_multihead_matmul_with_sparse_pass inference)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把replace_dense_with_sparse_pass改成replace_dense_fc_with_sparse_pass或者把replace_dense_multihead_matmul_with_sparse_pass合入到replace_dense_with_sparse_pass。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace_dense_with_sparse_pass -> replace_dense_fc_with_sparse_pass

@minghaoBD minghaoBD requested a review from b3602sss June 1, 2022 02:29
@minghaoBD minghaoBD closed this Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants