Add filter for cagra::merge#1496
Conversation
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| cagra::index_params output_index_params; | ||
|
|
||
| /// Strategy for merging. Defaults to `MergeStrategy::MERGE_STRATEGY_PHYSICAL`. | ||
| cuvs::neighbors::MergeStrategy merge_strategy = |
There was a problem hiding this comment.
Should we just remove the merge params altogether? I don't know if we need them anymore. Maybe we do- that's why I'm asking. Might help to think forward to when we have a SMART merge option (maye you've thought of this already). But maybe we don't need a "merge strategy" for that? Maybe we just have a flag or parameter for "rebuild" vs "combine"?
| reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices)); | ||
| reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices, filter)); | ||
| } else if (dtype.code == kDLUInt && dtype.bits == 8) { | ||
| output_index->addr = |
There was a problem hiding this comment.
Ah- I was trying to figure out how you were going to support allowing a user to pass in the "index" to populate in the C layer while only have the option to return a newly created index in the C++ layer. Not opposed to this solution at all. Just good that we're not doing a copy!
| auto out_layout = raft::make_strided_layout(filtered_dataset.view().extents(), | ||
| std::array<int64_t, 2>{stride, 1}); | ||
|
|
||
| merged_index.update_dataset(handle, owning_t{std::move(filtered_dataset), out_layout}); |
There was a problem hiding this comment.
We really need to prioritize having the indexes all accept "Dataset" instead of mdspans. This is going to get out of control :-(
| template <typename T, typename IdxT> | ||
| struct index; | ||
| struct merge_params; | ||
| struct index_params; |
There was a problem hiding this comment.
These include files have gotten super lightweight now that they are only including prototypes and declarations. I'm not a huge fan of all of the CAGRA headers we're creating. Can we just consolidate this into the main header?
Also- for some reason, we have a header for cagra_optimize.hpp and I think this was a misunderstanding. Would you be able to copy that into the main cagra.hpp in this PR? It should be a straightforward copy/paste... just trying to keep the number of headers manageable. The whole benefit to no longer being header only is that the headers are super super lightweight.
| 0.006, | ||
| min_recall)); | ||
|
|
||
| /* TODO: eval_distances doesn't work, potentially because of id translation mismatch |
There was a problem hiding this comment.
fwiw, This needs either fixed (by adding id translations) - or removed before this PR should be merged
robertmaynard
left a comment
There was a problem hiding this comment.
Currently breaks ABI stability between 25.12 and 26.02
If we are planning on getting this into 26.02 as is we will need to bump our ABI. If we want to keep ABI compat we need a cuvsCagraMerge_v2
|
/merge |
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging. Also remove the 'LOGICAL_MERGE' strategy from the `cagra::merge` api, since this is only supported with the composite merge api. (The `cagra::merge` function ignore this value if set, and only does a logical merge). Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#1496
|
@benfred We are going to have to open a new PR and introduce |
|
@robertmaynard this change is in 26.02 - I messed up the merge on this pr, but we merged into 26.02 with #1755 |
@benfred That is great news! Thanks |
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change. Currently the following checks are made and flagged by this tool: * Functions that have been removed from the C-ABI * Functions that have extra parameters added * Functions that have parameters removed * Functions that have the type of any parameter changed * Structs that have been removed * Structs that have have members removed * Structs that have the types of members changed * Enum values that have been removed * Enum values that have their value changed This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet. To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes. When run on a branch that has a breaking ABI change (#1496 which has a number of breaking c-abi changes), the output is: ``` $ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584 Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587 Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576 ``` Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook closes #1739 Authors: - Ben Frederickson (https://github.com/benfred) - Mike Sarahan (https://github.com/msarahan) Approvers: - Mike Sarahan (https://github.com/msarahan) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #1749
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change. Currently the following checks are made and flagged by this tool: * Functions that have been removed from the C-ABI * Functions that have extra parameters added * Functions that have parameters removed * Functions that have the type of any parameter changed * Structs that have been removed * Structs that have have members removed * Structs that have the types of members changed * Enum values that have been removed * Enum values that have their value changed This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet. To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes. When run on a branch that has a breaking ABI change (rapidsai#1496 which has a number of breaking c-abi changes), the output is: ``` $ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584 Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587 Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576 ``` Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook closes rapidsai#1739 Authors: - Ben Frederickson (https://github.com/benfred) - Mike Sarahan (https://github.com/msarahan) Approvers: - Mike Sarahan (https://github.com/msarahan) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: rapidsai#1749
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change. Currently the following checks are made and flagged by this tool: * Functions that have been removed from the C-ABI * Functions that have extra parameters added * Functions that have parameters removed * Functions that have the type of any parameter changed * Structs that have been removed * Structs that have have members removed * Structs that have the types of members changed * Enum values that have been removed * Enum values that have their value changed This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet. To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes. When run on a branch that has a breaking ABI change (rapidsai#1496 which has a number of breaking c-abi changes), the output is: ``` $ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584 Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587 Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882 Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576 ``` Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook closes rapidsai#1739 Authors: - Ben Frederickson (https://github.com/benfred) - Mike Sarahan (https://github.com/msarahan) Approvers: - Mike Sarahan (https://github.com/msarahan) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: rapidsai#1749
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging.
Also remove the 'LOGICAL_MERGE' strategy from the
cagra::mergeapi, since this is only supported with the composite merge api. (Thecagra::mergefunction ignore this value if set, and only does a logical merge).