Instantiate only specific RAFT reduction kernels#6780
Instantiate only specific RAFT reduction kernels#6780rapids-bot[bot] merged 13 commits intorapidsai:branch-25.08from
Conversation
dantegd
left a comment
There was a problem hiding this comment.
Changes look good, had couple minor things, but overall looks great!
| MSG="${MSG}<br/>parallel build time: $compile_total seconds" | ||
| if [[ -f "${LIBCUML_BUILD_DIR}/libcuml++.so" ]]; then | ||
| LIBCUML_FS=$(ls -lh ${LIBCUML_BUILD_DIR}/libcuml++.so | awk '{print $5}') | ||
| LIBCUML_FS=$(stat -c %s ${LIBCUML_BUILD_DIR}/libcuml++.so | awk '{printf "%.2f MB", $1/1024/1024}') |
| norm, | ||
| is_row_major_contiguous, | ||
| handle.get_stream()); | ||
| if (is_row_major_contiguous) { |
There was a problem hiding this comment.
ugh.. I know this is good for binary size, but this if is so large! I wish there was a nicer way to do this without introducing templates in the cuml side...
There was a problem hiding this comment.
I agree, it's an eye sore. But what it does is convey intentionality by asking the downstream user to instantiate every kernel that they require.
When we update the libcuml API to mdspan, we will have a clear distinction between row and column major APIs by virtue of that being a template type.
msarahan
left a comment
There was a problem hiding this comment.
Approved aside from revert comment
jakirkham
left a comment
There was a problem hiding this comment.
Thanks Divye! 🙏
Noticed one stray comment. Otherwise looks good
Co-authored-by: jakirkham <jakirkham@gmail.com>
|
/merge |
6d62a8a
into
rapidsai:branch-25.08
Depends on rapidsai/raft#2679, https://github.com/rapidsai/cumlprims_mg/pull/263, and rapidsai/cuvs#925 with reference issue rapidsai/raft#2681.
This reduces the CUDA 12 binary size of
libcuml++.soby 11 MB from ~286 MB to ~275 MB.