Skip to content

Commit f15c1ea

Browse files
authored
Fix ann-bench dataset blob integer overflow leading to incorrect data copy beyond 4B elems (#671)
ann-bench keeps data dimensions as `uint32_t`. We use `std::fread` to copy the data from a file to the host memory and pass `n_rows * n_cols` there, which gets casted to size_t only after the multiplication. This leads to integer overflow for the datasets larger than 4B elements and a partial data copy. This PR fixes the bug by casting the dimensions before the multiplication. The bug only affects the benchmark cases where the data is requested in the host memory not backed by a file. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #671
1 parent 4b289a0 commit f15c1ea

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

cpp/bench/ann/src/common/blob.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,8 @@ struct blob_mmap {
453453
size_t size = data_end - data_start;
454454
mmap_owner owner{size, flags};
455455
std::fseek(file_.descriptor().value(), data_start, SEEK_SET);
456-
size_t n_elems = file_.rows_limit() * file_.n_cols();
456+
auto n_elems =
457+
static_cast<size_t>(file_.rows_limit()) * static_cast<size_t>(file_.n_cols());
457458
if (std::fread(owner.data(), sizeof(T), n_elems, file_.descriptor().value()) != n_elems) {
458459
throw std::runtime_error{"cuvs::bench::blob_mmap() fread " + file_.path() + " failed"};
459460
}

0 commit comments

Comments
 (0)