Skip to content

Commit 424a3e9

Browse files
authored
Fix binary quantizer host transform bounds and stream order (#1473)
Fix two issues with the binary quantizer's host transform: 1. Synchronize the stream after async GPU operations and before a host-only omp loop; 2. Fill the bytes between `minimul_out_dim` and `out_dim` with zeroes to match device transform behavior. This PR is a prerequisite to merging rapidsai/raft#2835 (otherwise the tests fail due to host mdarray not initializing with zeroes anymore). Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - tsuki (https://github.com/enp1s0) - Micka (https://github.com/lowener) URL: #1473
1 parent 96b6e3c commit 424a3e9

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

cpp/src/preprocessing/quantize/detail/binary.cuh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,11 +432,17 @@ void transform(raft::resources const& res,
432432
raft::copy(
433433
threshold_ptr, casted_vec.data_handle(), dataset_dim, raft::resource::get_cuda_stream(res));
434434
}
435+
// Populate the threshold_ptr on the host side before the host parallel loop.
436+
raft::resource::sync_stream(res);
435437
}
436438

437439
#pragma omp parallel for collapse(2)
438440
for (size_t i = 0; i < dataset_size; ++i) {
439-
for (uint32_t out_j = 0; out_j < minimul_out_dim; ++out_j) {
441+
for (uint32_t out_j = 0; out_j < out_dim; ++out_j) {
442+
if (out_j >= minimul_out_dim) {
443+
out(i, out_j) = 0;
444+
continue;
445+
}
440446
QuantI pack = 0;
441447
for (uint32_t pack_j = 0; pack_j < bits_per_pack; ++pack_j) {
442448
const uint32_t in_j = out_j * bits_per_pack + pack_j;

0 commit comments

Comments
 (0)