BitwiseHamming distance for NN Descent#1101
BitwiseHamming distance for NN Descent#1101rapids-bot[bot] merged 19 commits intorapidsai:branch-25.08from
Conversation
| int num_load_elems = (step == raft::ceildiv(data_dim, TILE_COL_WIDTH) - 1) | ||
| ? data_dim - step * TILE_COL_WIDTH | ||
| : TILE_COL_WIDTH; | ||
| if (metric != cuvs::distance::DistanceType::BitwiseHamming) { |
There was a problem hiding this comment.
The only diff here is wrapping the wmma operation part with if (metric != cuvs::distance::DistanceType::BitwiseHamming) so that we don't perform unnecessary computations for BitwiseHamming. (the diff seems to detect all additional indents as changes which makes it confusing)
divyegala
left a comment
There was a problem hiding this comment.
We should be able to allocate just half the fp16 matrix for bitwise hamming, right?
| res, | ||
| nrow_, | ||
| build_config.metric == cuvs::distance::DistanceType::BitwiseHamming | ||
| ? (build_config.dataset_dim + 1) / 2 | ||
| : build_config.dataset_dim)}, |
There was a problem hiding this comment.
Yep that's what we do here!
| const uint8_t* data_n1 = reinterpret_cast<const uint8_t*>(data) + n1 * data_dim; | ||
| const uint8_t* data_n2 = reinterpret_cast<const uint8_t*>(data) + n2 * data_dim; | ||
| for (int d = 0; d < data_dim; d++) { | ||
| s_distances[i] += __popc(static_cast<uint32_t>(data_n1[d] ^ data_n2[d]) & 0xff); |
There was a problem hiding this comment.
wherever you are doing this, have you ensured that the dim that you pass along is correctly scaled? If you were to convert a half to two uint8s, the dim would have to be doubled.
Furthermore, I haven't looked into the nn descent logic entirely, but in case you are operating in the half space, I dont think you'd have to reinterpret_cast everywhere to uint8. Its more efficient to stay in the half space and do something like:
__popc(static_cast<uint32_t>(data_n1[d] ^ data_n2[d]) & 0xffffu)
There was a problem hiding this comment.
wherever you are doing this, have you ensured that the dim that you pass along is correctly scaled? If you were to convert a half to two uint8s, the dim would have to be doubled.
I am allocating dim/2 dimensions for the half type array. To make things straightforward, the dim is always used as the dimension of the "given dataset" (fp32, int8 etc). And dimensions for allocating the fp16 type device array is configured based on the data type (dim/2 for int8 and uint8, as-is for other types)
I dont think you'd have to reinterpret_cast everywhere to uint8.
the issue with keeping the pointer fp16 and doing this for data_dim/2 dimensions results in having to check if the last byte is a valid value or not inside the kernel (because the original int8 data could have an odd number of dimensions). I thought it would be easier to cast to int8 to loop over the original data_dim instead : )
There was a problem hiding this comment.
I see. Yes there can be an odd number of dims, for which we can fall back to int8, but if its not an odd number of dims, we can do it in the half space. I'd argue that we can do even more -- if it is divisible by 4, reinterpret_cast to uint32_t so you'd have to popcount only over one fourth the dim (I'm doing the same thing with BitwiseHamming in ivf-flat). However, considering the deadlines, we can look into these optimizations later. Can we create a github issue for this and write it as a TODO here?
Regarding the dims, I just wanted to verify if the dims being used everywhere need to be scaled, but it looks like you have already checked those things, so apart from the creation of that github issue this PR looks okay to me.
|
/merge |
722b7e6
into
rapidsai:branch-25.08
Adding bitwise hamming distance for NN Descent