refine_device_float_float.cu.o: ~45 MB
refine_device_int8_t_float.cu.o: ~26 MB
refine_device_uint8_t_float.cu.o: ~26 MB
refine_device_half_float.cu.o: ~24 MB
refine_device.cuh seemingly instantiates no kernels, but it is likely instantiating kernels from calling into detail of other files. There may be an opportunity here to consolidate the kernel instantiations to just once.
- refine and ivf_flat::extend both call into
build_index_kernel
|
build_index_kernel<T, IdxT, LabelT, true> |
|
<<<grid_dim, block_dim, 0, stream>>>(new_labels.data(), |
|
dataset, |
|
candidate_idx, |
|
refinement_index->data_ptrs().data_handle(), |
|
refinement_index->inds_ptrs().data_handle(), |
|
list_sizes_ptr, |
|
n_queries * n_candidates, |
|
refinement_index->dim(), |
|
refinement_index->veclen()); |
- refine and ivf_flat::search both call into
ivfflat_interleaved_scan kernels
- refine, ivf_flat::search, and ivf_pq::search all call into
postprocess_neighbors_kernel
|
postprocess_neighbors_kernel(IdxT* neighbors_out, // [n_queries, topk] |
refine_device_float_float.cu.o: ~45 MBrefine_device_int8_t_float.cu.o: ~26 MBrefine_device_uint8_t_float.cu.o: ~26 MBrefine_device_half_float.cu.o: ~24 MBrefine_device.cuhseemingly instantiates no kernels, but it is likely instantiating kernels from calling intodetailof other files. There may be an opportunity here to consolidate the kernel instantiations to just once.build_index_kernelcuvs/cpp/src/neighbors/ivf_flat/ivf_flat_build.cuh
Lines 514 to 523 in 14d137c
ivfflat_interleaved_scankernelspostprocess_neighbors_kernelcuvs/cpp/src/neighbors/ivf_common.cuh
Line 153 in 14d137c