Skip to content

Instantiate only specific RAFT reduction kernels#6780

Merged
rapids-bot[bot] merged 13 commits intorapidsai:branch-25.08from
divyegala:raft-reduce
Jun 12, 2025
Merged

Instantiate only specific RAFT reduction kernels#6780
rapids-bot[bot] merged 13 commits intorapidsai:branch-25.08from
divyegala:raft-reduce

Conversation

@divyegala
Copy link
Copy Markdown
Member

@divyegala divyegala commented May 21, 2025

Depends on rapidsai/raft#2679, https://github.com/rapidsai/cumlprims_mg/pull/263, and rapidsai/cuvs#925 with reference issue rapidsai/raft#2681.

This reduces the CUDA 12 binary size of libcuml++.soby 11 MB from ~286 MB to ~275 MB.

@divyegala divyegala self-assigned this May 21, 2025
@divyegala divyegala requested review from a team as code owners May 21, 2025 19:49
@divyegala divyegala requested a review from robertmaynard May 21, 2025 19:49
@divyegala divyegala added the improvement Improvement / enhancement to an existing function label May 21, 2025
@divyegala divyegala requested a review from vyasr May 21, 2025 19:49
@divyegala divyegala added the non-breaking Non-breaking change label May 21, 2025
@divyegala divyegala requested a review from lowener May 21, 2025 19:49
@divyegala divyegala changed the title Instantiate only specified RAFT reduction kernels Instantiate only specific RAFT reduction kernels May 21, 2025
@divyegala divyegala requested a review from a team as a code owner May 28, 2025 05:14
Copy link
Copy Markdown
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, had couple minor things, but overall looks great!

Comment thread build.sh
MSG="${MSG}<br/>parallel build time: $compile_total seconds"
if [[ -f "${LIBCUML_BUILD_DIR}/libcuml++.so" ]]; then
LIBCUML_FS=$(ls -lh ${LIBCUML_BUILD_DIR}/libcuml++.so | awk '{print $5}')
LIBCUML_FS=$(stat -c %s ${LIBCUML_BUILD_DIR}/libcuml++.so | awk '{printf "%.2f MB", $1/1024/1024}')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice change

norm,
is_row_major_contiguous,
handle.get_stream());
if (is_row_major_contiguous) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh.. I know this is good for binary size, but this if is so large! I wish there was a nicer way to do this without introducing templates in the cuml side...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's an eye sore. But what it does is convey intentionality by asking the downstream user to instantiate every kernel that they require.

When we update the libcuml API to mdspan, we will have a clear distinction between row and column major APIs by virtue of that being a template type.

Copy link
Copy Markdown
Contributor

@msarahan msarahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved aside from revert comment

Comment thread cpp/cmake/thirdparty/get_raft.cmake Outdated
Copy link
Copy Markdown
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Divye! 🙏

Noticed one stray comment. Otherwise looks good

Comment thread cpp/cmake/thirdparty/get_raft.cmake Outdated
Co-authored-by: jakirkham <jakirkham@gmail.com>
@divyegala
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot Bot merged commit 6d62a8a into rapidsai:branch-25.08 Jun 12, 2025
140 of 152 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants