CAGRA API update and allow async host refinement #131
CAGRA API update and allow async host refinement #131rapids-bot[bot] merged 24 commits intorapidsai:branch-24.06from
Conversation
tfeher
left a comment
There was a problem hiding this comment.
Thanks Malte for the PR, it looks good overall, just two small comments.
Public API is needed to allow fine control of CAGRA IVF-PQ build options.
| const cuvs::neighbors::cagra::index_params& params, | ||
| raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset) | ||
| -> cuvs::neighbors::cagra::index<uint8_t, uint32_t>; | ||
| auto build( |
There was a problem hiding this comment.
I believe the documentation from cagra.cuh shall be moved here. @cjnolet to confirm.
There was a problem hiding this comment.
Correct- we should document every function / prototype exposed in include/cuvs/**.hpp. It seems repetitive but the API docs are fully indexed and searchable so when a user finds one of these functions through search, it's guaranteed to have the docs carried along with it.
| raft::resources const& handle, | ||
| const cuvs::neighbors::cagra::index_params& params, | ||
| raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset, | ||
| std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt, |
There was a problem hiding this comment.
This is awkward- a user is going to have to set std::nullopt for at least one of nn-descent_params or ivf_pq params. I think we should make these build_params for cagra. Actually, I think what we should do is take away the graph_build_algo option from build_params and add a graph_build_params option that accepts the cuvs::neighbors::build_params base class, documents it approprirately with what's allowed, and then dispatches accordingly. Also remember to raise an exception if anything other than ivf_pq or nn_descent is used. This allows us to add more graph build algos in the future without too much additional effort but it also makes the public APIs super simple for the user- by default we can just set the build_params to the default nn_descent::build_params.
There was a problem hiding this comment.
On second though- since you need refine_ratio on the ivf_pq_build_params, why not have a new type cagra::graph_build_params which can extend cuvs::neighbors::build_params? Then you can have a cagra::graph_build_params::ivf_pq which extends ivf_pq::build_params and cagra::graph_build_params and adds the refine ratio. Then you can have cagra__graph_build_params::nn_descent which does the same for nn-descent. This way you can have cagra::build_params.graph_build_params which accepts anything that extends cagra::graph_build_params and dispatches the build algo accordingly? This would get all this additional stuff out of cagra::build and keep the API clean and easy to follow, while minimizing duplication.
There was a problem hiding this comment.
@cjnolet , i am not sure whether I understand correctly what you are proposing. An abstract graph_build_params struct that is inherited from different variants, e.g.
namespace ivf_pq {
struct graph_build_params : cuvs::neighbors::cagra::index_params {
cuvs::neighbors::ivf_pq::index_params pq_build_params;
cuvs::neighbors::ivf_pq::search_params pq_search_params;
float refinement_rate = 2f;
};
} // namespace
for the ivf_pq variant? To my understanding this would require to either implement interfaces individually or use of polymorphism/pointers that can be casted/dispatched. Do I understand your proposal correctly?
As an alternative cleanup step we could also combine additional variant-specific-parameters into (ivf_pq::)graph_build_params and add them as a single argument to the cagra::index_params
struct index_params : cuvs::neighbors::index_params {
size_t intermediate_graph_degree = 128;
size_t graph_degree = 64;
std::optional<vpq_params> compression = std::nullopt;
bool construct_index_with_dataset = true;
std::variant<ivf_pq::graph_build_params, nn_descent::graph_build_params> graph_build_params;
};
What do you think should we go for?
There was a problem hiding this comment.
I think your last idea looks great. It's clean, explicit, and tells the user right away what graph_build_params variants are supported.
| raft::resources const& handle, | ||
| const cuvs::neighbors::cagra::index_params& params, | ||
| raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset, | ||
| std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt, |
There was a problem hiding this comment.
On second though- since you need refine_ratio on the ivf_pq_build_params, why not have a new type cagra::graph_build_params which can extend cuvs::neighbors::build_params? Then you can have a cagra::graph_build_params::ivf_pq which extends ivf_pq::build_params and cagra::graph_build_params and adds the refine ratio. Then you can have cagra__graph_build_params::nn_descent which does the same for nn-descent. This way you can have cagra::build_params.graph_build_params which accepts anything that extends cagra::graph_build_params and dispatches the build algo accordingly? This would get all this additional stuff out of cagra::build and keep the API clean and easy to follow, while minimizing duplication.
| * NOTE: this is experimental new API, consider it unsafe. | ||
| */ | ||
| std::optional<cuvs::neighbors::vpq_params> compression = std::nullopt; | ||
|
|
There was a problem hiding this comment.
| bool construct_index_with_dataset = true; |
There was a problem hiding this comment.
Removed, will try to put dataset on device. In case of OOM, catch error and return index without added dataset.
|
@tfeher , thanks for the review. I pushed a new version containing the suggestions. |
tfeher
left a comment
There was a problem hiding this comment.
Thanks Malte, just one nitpick.
| */ | ||
| std::optional<cuvs::neighbors::vpq_params> compression = std::nullopt; | ||
|
|
||
| bool construct_index_with_dataset = true; |
There was a problem hiding this comment.
cc @divyegala this is the option to enable not having the dataset vectors copied to device automatically after building the index.
There was a problem hiding this comment.
| index_params.graph_build_params = cuvs::neighbors::cagra::graph_build_params::ivf_pq_params{}; | ||
| break; | ||
| case cuvsCagraGraphBuildAlgo::NN_DESCENT: | ||
| cuvs::neighbors::cagra::graph_build_params::nn_descent_params nn_descent_params{}; |
There was a problem hiding this comment.
Can we consolidate the C++ and C structs so that we don't need to duplicate all of these values, please? It's going to be painful to manage two complete copies of these structs because changes are going to get made to one without being made to the other.
I did this with the DIstanceType enum in cuvs::distance, for example. I think we should ultimately strive to write these once (in C++ or C) and then expose them through the other.
tfeher
left a comment
There was a problem hiding this comment.
Highlighting recent changes.
| * NOTE: this is experimental new API, consider it unsafe. | ||
| */ | ||
| std::optional<cuvs::neighbors::vpq_params> compression = std::nullopt; | ||
|
|
There was a problem hiding this comment.
Removed, will try to put dataset on device. In case of OOM, catch error and return index without added dataset.
| search_params.n_probes = std::max<uint32_t>(10, build_params.n_lists * 0.01); | ||
| search_params.lut_dtype = CUDA_R_16F; | ||
| search_params.internal_distance_dtype = CUDA_R_16F; |
| build_knn_graph(res, dataset, knn_graph->view(), refine_rate, pq_build_params, search_params); | ||
| // Set default value in case knn_build_params is not defined. | ||
| auto knn_build_params = params.graph_build_params; | ||
| if (std::holds_alternative<std::monostate>(params.graph_build_params)) { |
| // We just add the graph. User is expected to update dataset separately. This branch is used | ||
| // if user needs special control of memory allocations for the dataset. | ||
| } catch (std::bad_alloc& e) { | ||
| RAFT_LOG_DEBUG("Insufficient GPU memory to construct CAGRA index with dataset on GPU"); |
There was a problem hiding this comment.
Automatically construct index without dataset if dataset would not fit GPU memory.
| * @return true if enough GPU memory can be allocated | ||
| * @return false otherwise | ||
| */ | ||
| bool has_enough_device_memory(raft::resources const& res, |
There was a problem hiding this comment.
Small nitpick- can we move this to the cuvs::neighbors::nn_descent::helpers namespace please?
We can do it in a follow-on PR for 24.08 so we can get this merged ASAP.
|
/merge |
Fixes handling OOM error during CAGRA index creation, that was introduced in #131. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #167
This PR updates the CAGRA public API, changes defaults, and improves refinement during IVF-PQ build step.
Updated defaults:
By default CAGRA would select NN descent. We fall back to IVF-PQ build algorithm if there is not enough memory for NN descent.
For the IVF-PQ build algo, the search params were updated to use
n_probe = 0.01*nlist, and both LUT and internal distance type is set tofp16(as opposed tofp8/fp32previously)By default build would create the index that contains both the
graphand thedataseton GPU. If thedatasetdoes not fit gpu, then the returned index will only contain the graph (on device). In such case the user is expected to callindex.update_dataset()(for example with dataset in managed memory) before we cansearchthe index.API changes:
Additionally, this PR optimizes the IVF-PQ refinement step within the CAGRA graph creation in case the dataset is in host-memory.