Use cuda::stream_ref for stream usage#2372
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughReplaces usages of ChangesStream type migration (single cohesive DAG)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/benchmarks/device_uvector/device_uvector_bench.cu`:
- Line 94: The cudaMemsetAsync call using vec.data(), num_elements and
stream.get() must be wrapped with the RMM_CUDA_TRY macro to surface CUDA errors;
replace the unchecked call cudaMemsetAsync(vec.data(), 0, num_elements *
sizeof(std::int32_t), stream.get()) with
RMM_CUDA_TRY(cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t),
stream.get())) so failures are reported (ensure RMM_CUDA_TRY is available in the
translation unit).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6a80af3f-7a8f-4182-886b-2d10727b816d
📒 Files selected for processing (7)
cpp/benchmarks/cuda_stream_pool/cuda_stream_pool_bench.cppcpp/benchmarks/device_uvector/device_uvector_bench.cucpp/benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cucpp/benchmarks/random_allocations/random_allocations.cppcpp/benchmarks/replay/replay.cppcpp/benchmarks/synchronization/synchronization.cppcpp/benchmarks/synchronization/synchronization.hpp
✅ Files skipped from review due to trivial changes (1)
- cpp/benchmarks/cuda_stream_pool/cuda_stream_pool_bench.cpp
| auto vec = Vector(num_elements, stream); | ||
| if (zero_init) { | ||
| cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.value()); | ||
| cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.get()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify async memset call sites and wrapper usage in this file
rg -n -C2 'cudaMemsetAsync\(' cpp/benchmarks/device_uvector/device_uvector_bench.cu
rg -n 'RMM_CUDA_TRY\s*\(\s*cudaMemsetAsync' cpp/benchmarks/device_uvector/device_uvector_bench.cuRepository: rapidsai/rmm
Length of output: 248
Wrap cudaMemsetAsync with RMM_CUDA_TRY to detect CUDA errors.
Line 94 has an unchecked cudaMemsetAsync call that could silently fail. Per coding guidelines, all CUDA API calls must be wrapped with RMM_CUDA_TRY to detect errors early.
Suggested patch
- cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.get());
+ RMM_CUDA_TRY(
+ cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.get()));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.get()); | |
| RMM_CUDA_TRY( | |
| cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t), stream.get())); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/benchmarks/device_uvector/device_uvector_bench.cu` at line 94, The
cudaMemsetAsync call using vec.data(), num_elements and stream.get() must be
wrapped with the RMM_CUDA_TRY macro to surface CUDA errors; replace the
unchecked call cudaMemsetAsync(vec.data(), 0, num_elements *
sizeof(std::int32_t), stream.get()) with
RMM_CUDA_TRY(cudaMemsetAsync(vec.data(), 0, num_elements * sizeof(std::int32_t),
stream.get())) so failures are reported (ensure RMM_CUDA_TRY is available in the
translation unit).
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/tests/mr/mr_ref_test_mt_helpers.hpp (1)
64-65:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winWrap the event synchronizations with
RMM_CUDA_TRY.Line 65 and Line 86 still ignore the result of
cudaEventSynchronize. If either synchronization fails, these helpers can mask the CUDA error and make the MT stream-ordering tests harder to trust.Suggested fix
- cudaEventSynchronize(event); + RMM_CUDA_TRY(cudaEventSynchronize(event)); ... - cudaEventSynchronize(event); + RMM_CUDA_TRY(cudaEventSynchronize(event));As per coding guidelines, "Check all CUDA errors with RMM_CUDA_TRY, RMM_EXPECTS, RMM_FAIL macros; unchecked errors in memory operations, synchronization, and device calls cause silent corruption".
Also applies to: 85-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mr/mr_ref_test_mt_helpers.hpp` around lines 64 - 65, The cudaEventSynchronize calls in mr_ref_test_mt_helpers.hpp are un-checked and should be wrapped with RMM_CUDA_TRY to surface CUDA errors; replace occurrences of cudaEventSynchronize(event) (and the second instance later in the file) with RMM_CUDA_TRY(cudaEventSynchronize(event)); ensure you include the RMM header if not already present and keep the synchronization semantics unchanged.cpp/tests/mr/mr_ref_test.hpp (1)
245-270:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the async MR APIs in
test_random_async_allocations.This helper now threads
cuda::stream_refthrough the signature, but Line 262 and Line 269 still useallocate_sync/deallocate_sync. That means theRandomAllocations*tests keep exercising the synchronous path and won't catch regressions in stream-ordered allocation/deallocation.Suggested fix
[&generator, &distribution, &ref, stream](allocation& alloc) { alloc.size = distribution(generator); - EXPECT_NO_THROW(alloc.ptr = ref.allocate_sync(alloc.size, rmm::CUDA_ALLOCATION_ALIGNMENT)); + EXPECT_NO_THROW( + alloc.ptr = ref.allocate(stream, alloc.size, rmm::CUDA_ALLOCATION_ALIGNMENT)); RMM_CUDA_TRY(cudaStreamSynchronize(stream.get())); EXPECT_NE(nullptr, alloc.ptr); EXPECT_TRUE(is_properly_aligned(alloc.ptr)); }); std::for_each(allocations.begin(), allocations.end(), [stream, &ref](allocation& alloc) { - EXPECT_NO_THROW(ref.deallocate_sync(alloc.ptr, alloc.size, rmm::CUDA_ALLOCATION_ALIGNMENT)); + EXPECT_NO_THROW(ref.deallocate(stream, alloc.ptr, alloc.size, rmm::CUDA_ALLOCATION_ALIGNMENT)); RMM_CUDA_TRY(cudaStreamSynchronize(stream.get())); });As per coding guidelines, "All async memory operations must accept cuda_stream_view parameter and handle stream synchronization before memory is returned to pool or deallocated".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mr/mr_ref_test.hpp` around lines 245 - 270, In test_random_async_allocations replace the synchronous calls allocate_sync/deallocate_sync with the stream-ordered async MR APIs so the helper actually tests async behavior: update the allocation lambda to call the resource's async allocate (the allocate_async variant that accepts the cuda::stream_ref or cuda_stream_view and alignment) and update the deallocate lambda to call the corresponding async deallocate (deallocate_async with stream + alignment), then ensure you still synchronize the provided stream (RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()))) at the same points so the test waits for the async operations to complete; target the function test_random_async_allocations and the lambdas that currently call alloc.ptr = ref.allocate_sync(...) and ref.deallocate_sync(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/include/rmm/device_scalar.hpp`:
- Around line 219-221: The call in set_value_to_zero_async is passing
value_type{0} to _storage.set_element_to_zero_async but that helper expects an
element index (size_type) not a value; update the call in
set_value_to_zero_async to pass element index 0 (i.e. use 0 as the first
argument) so _storage.set_element_to_zero_async(0, stream) is invoked; reference
functions/fields: set_value_to_zero_async, _storage.set_element_to_zero_async,
and value_type to locate and correct the call.
In `@cpp/tests/device_buffer_tests.cu`:
- Around line 60-80: Multiple tests repeat the null-stream construction
(cuda::stream_ref{cudaStream_t{nullptr}} and
rmm::cuda_stream_view{cudaStream_t{nullptr}}); introduce file-scope constants
(e.g., kNullStreamRef and kNullCudaStreamView) and replace each literal with
those constants in tests that construct rmm::device_buffer or query
buff.stream() (references: DeviceBufferTest, TYPED_TESTs EmptyBuffer and
DefaultMemoryResource, local variables buf/buff), ensuring the constants have
the same types and are initialized once at top of the file so all occurrences
reuse them.
---
Outside diff comments:
In `@cpp/tests/mr/mr_ref_test_mt_helpers.hpp`:
- Around line 64-65: The cudaEventSynchronize calls in
mr_ref_test_mt_helpers.hpp are un-checked and should be wrapped with
RMM_CUDA_TRY to surface CUDA errors; replace occurrences of
cudaEventSynchronize(event) (and the second instance later in the file) with
RMM_CUDA_TRY(cudaEventSynchronize(event)); ensure you include the RMM header if
not already present and keep the synchronization semantics unchanged.
In `@cpp/tests/mr/mr_ref_test.hpp`:
- Around line 245-270: In test_random_async_allocations replace the synchronous
calls allocate_sync/deallocate_sync with the stream-ordered async MR APIs so the
helper actually tests async behavior: update the allocation lambda to call the
resource's async allocate (the allocate_async variant that accepts the
cuda::stream_ref or cuda_stream_view and alignment) and update the deallocate
lambda to call the corresponding async deallocate (deallocate_async with stream
+ alignment), then ensure you still synchronize the provided stream
(RMM_CUDA_TRY(cudaStreamSynchronize(stream.get()))) at the same points so the
test waits for the async operations to complete; target the function
test_random_async_allocations and the lambdas that currently call alloc.ptr =
ref.allocate_sync(...) and ref.deallocate_sync(...).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 40051d54-d640-4c86-8a96-ad99018c6d69
📒 Files selected for processing (23)
cpp/include/rmm/device_scalar.hppcpp/tests/container_multidevice_tests.cucpp/tests/cuda_stream_tests.cppcpp/tests/device_buffer_tests.cucpp/tests/device_check_resource_adaptor.hppcpp/tests/device_scalar_tests.cppcpp/tests/device_uvector_tests.cppcpp/tests/mock_resource.hppcpp/tests/mr/aligned_mr_tests.cppcpp/tests/mr/arena_mr_tests.cppcpp/tests/mr/cccl_mr_ref_test_allocation.hppcpp/tests/mr/cccl_mr_ref_test_basic.hppcpp/tests/mr/cccl_mr_ref_test_mt.hppcpp/tests/mr/failure_callback_mr_tests.cppcpp/tests/mr/mr_ref_test.hppcpp/tests/mr/mr_ref_test_allocation.hppcpp/tests/mr/mr_ref_test_basic.hppcpp/tests/mr/mr_ref_test_mt.hppcpp/tests/mr/mr_ref_test_mt_helpers.hppcpp/tests/mr/pool_mr_tests.cppcpp/tests/mr/statistics_mr_tests.cppcpp/tests/mr/thrust_allocator_tests.cucpp/tests/mr/tracking_mr_tests.cpp
| void set_value_to_zero_async(cuda::stream_ref stream) | ||
| { | ||
| _storage.set_element_to_zero_async(value_type{0}, stream); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the callee signature and this call site side-by-side.
rg -n -C2 'set_element_to_zero_async\s*\(' cpp/include/rmm/device_uvector.hpp cpp/include/rmm/device_scalar.hppRepository: rapidsai/rmm
Length of output: 1085
Use element index 0 in zeroing call (Line 221).
set_element_to_zero_async expects an element index as its first argument, not a value. Passing value_type{0} is incorrect—the function signature requires size_type element_index (see line 248 in device_uvector.hpp). This will fail for trivially copyable types that are not implicitly convertible to the index type.
Proposed fix
- _storage.set_element_to_zero_async(value_type{0}, stream);
+ _storage.set_element_to_zero_async(size_type{0}, stream);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void set_value_to_zero_async(cuda::stream_ref stream) | |
| { | |
| _storage.set_element_to_zero_async(value_type{0}, stream); | |
| void set_value_to_zero_async(cuda::stream_ref stream) | |
| { | |
| _storage.set_element_to_zero_async(size_type{0}, stream); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/include/rmm/device_scalar.hpp` around lines 219 - 221, The call in
set_value_to_zero_async is passing value_type{0} to
_storage.set_element_to_zero_async but that helper expects an element index
(size_type) not a value; update the call in set_value_to_zero_async to pass
element index 0 (i.e. use 0 as the first argument) so
_storage.set_element_to_zero_async(0, stream) is invoked; reference
functions/fields: set_value_to_zero_async, _storage.set_element_to_zero_async,
and value_type to locate and correct the call.
| auto buf = rmm::device_buffer(10, cuda::stream_ref{cudaStream_t{nullptr}}, ref); | ||
| EXPECT_EQ(buf.size(), 10); | ||
| } | ||
|
|
||
| TYPED_TEST(DeviceBufferTest, EmptyBuffer) | ||
| { | ||
| rmm::device_buffer buff(0, rmm::cuda_stream_default); | ||
| rmm::device_buffer buff(0, cuda::stream_ref{cudaStream_t{nullptr}}); | ||
| EXPECT_TRUE(buff.is_empty()); | ||
| } | ||
|
|
||
| TYPED_TEST(DeviceBufferTest, DefaultMemoryResource) | ||
| { | ||
| rmm::device_buffer buff(this->size, rmm::cuda_stream_default); | ||
| rmm::device_buffer buff(this->size, cuda::stream_ref{cudaStream_t{nullptr}}); | ||
| EXPECT_NE(nullptr, buff.data()); | ||
| EXPECT_EQ(this->size, buff.size()); | ||
| EXPECT_EQ(this->size, buff.ssize()); | ||
| EXPECT_EQ(this->size, buff.capacity()); | ||
| EXPECT_EQ(rmm::device_async_resource_ref{rmm::mr::get_current_device_resource_ref()}, | ||
| buff.memory_resource()); | ||
| EXPECT_EQ(rmm::cuda_stream_default, buff.stream()); | ||
| EXPECT_EQ(rmm::cuda_stream_view{cudaStream_t{nullptr}}, buff.stream()); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Reduce repeated null-stream literals with shared test constants.
cuda::stream_ref{cudaStream_t{nullptr}} and rmm::cuda_stream_view{cudaStream_t{nullptr}} are repeated throughout this file. Please define shared file-scope constants/helpers and reuse them to keep test updates safer and less noisy.
Proposed refactor pattern
@@
namespace testing {
namespace thrust = THRUST_NS_QUALIFIER;
} // namespace testing
using namespace testing;
+
+namespace {
+auto const k_null_stream_ref = cuda::stream_ref{cudaStream_t{nullptr}};
+auto const k_null_stream_view = rmm::cuda_stream_view{cudaStream_t{nullptr}};
+} // namespace
@@
- auto buf = rmm::device_buffer(10, cuda::stream_ref{cudaStream_t{nullptr}}, ref);
+ auto buf = rmm::device_buffer(10, k_null_stream_ref, ref);
@@
- rmm::device_buffer buff(0, cuda::stream_ref{cudaStream_t{nullptr}});
+ rmm::device_buffer buff(0, k_null_stream_ref);
@@
- EXPECT_EQ(rmm::cuda_stream_view{cudaStream_t{nullptr}}, buff.stream());
+ EXPECT_EQ(k_null_stream_view, buff.stream());As per coding guidelines, "Detect significant code duplication (3+ occurrences of similar code); refactor into shared utility functions or base classes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/device_buffer_tests.cu` around lines 60 - 80, Multiple tests repeat
the null-stream construction (cuda::stream_ref{cudaStream_t{nullptr}} and
rmm::cuda_stream_view{cudaStream_t{nullptr}}); introduce file-scope constants
(e.g., kNullStreamRef and kNullCudaStreamView) and replace each literal with
those constants in tests that construct rmm::device_buffer or query
buff.stream() (references: DeviceBufferTest, TYPED_TESTs EmptyBuffer and
DefaultMemoryResource, local variables buf/buff), ensuring the constants have
the same types and are initialized once at top of the file so all occurrences
reuse them.
Summary
rmm::cuda_stream_viewusage withcuda::stream_refacross containers and memory-resource implementations.rmm::cuda_stream_viewcompatibility APIs while converting at boundaries where needed.cuda::stream_ref.Validation
cmake --build cpp/build/conda/cuda-13.1/release -j$(nproc)ctest --test-dir cpp/build/conda/cuda-13.1/release -R 'DEVICE_UVECTOR|DEVICE_SCALAR|DEVICE_BUFFER|CONTAINER_MULTIDEVICE' --output-on-failurectest --test-dir cpp/build/conda/cuda-13.1/release -R 'DEVICE_BUFFER|DEVICE_UVECTOR|PREFETCH|POOL_MR|FIXED_SIZE|ARENA_MR|THRUST_ALLOCATOR|POLYMORPHIC_ALLOCATOR|STREAM_ADAPTOR|CALLBACK_MR|STATISTICS|TRACKING|LIMITING|ALIGNED' --output-on-failuregit diff --name-only --diff-filter=ACM | xargs pre-commit run --files