Skip to content

Add filter for cagra::merge#1496

Merged
rapids-bot[bot] merged 17 commits into
rapidsai:mainfrom
benfred:filtered_merge
Jan 29, 2026
Merged

Add filter for cagra::merge#1496
rapids-bot[bot] merged 17 commits into
rapidsai:mainfrom
benfred:filtered_merge

Conversation

@benfred
Copy link
Copy Markdown
Contributor

@benfred benfred commented Nov 5, 2025

Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging.

Also remove the 'LOGICAL_MERGE' strategy from the cagra::merge api, since this is only supported with the composite merge api. (The cagra::merge function ignore this value if set, and only does a logical merge).

Add an optional filter argument to cagra::merge, so that we can compact
the index to remove deleted items when merging
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Nov 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 19, 2025
@cjnolet cjnolet moved this from Todo to In Progress in Unstructured Data Processing Nov 19, 2025
cagra::index_params output_index_params;

/// Strategy for merging. Defaults to `MergeStrategy::MERGE_STRATEGY_PHYSICAL`.
cuvs::neighbors::MergeStrategy merge_strategy =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove the merge params altogether? I don't know if we need them anymore. Maybe we do- that's why I'm asking. Might help to think forward to when we have a SMART merge option (maye you've thought of this already). But maybe we don't need a "merge strategy" for that? Maybe we just have a flag or parameter for "rebuild" vs "combine"?

Comment thread c/src/neighbors/cagra.cpp
reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices));
reinterpret_cast<uintptr_t>(_merge<int8_t>(res, *params, indices, num_indices, filter));
} else if (dtype.code == kDLUInt && dtype.bits == 8) {
output_index->addr =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah- I was trying to figure out how you were going to support allowing a user to pass in the "index" to populate in the C layer while only have the option to return a newly created index in the C++ layer. Not opposed to this solution at all. Just good that we're not doing a copy!

auto out_layout = raft::make_strided_layout(filtered_dataset.view().extents(),
std::array<int64_t, 2>{stride, 1});

merged_index.update_dataset(handle, owning_t{std::move(filtered_dataset), out_layout});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to prioritize having the indexes all accept "Dataset" instead of mdspans. This is going to get out of control :-(

template <typename T, typename IdxT>
struct index;
struct merge_params;
struct index_params;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These include files have gotten super lightweight now that they are only including prototypes and declarations. I'm not a huge fan of all of the CAGRA headers we're creating. Can we just consolidate this into the main header?

Also- for some reason, we have a header for cagra_optimize.hpp and I think this was a misunderstanding. Would you be able to copy that into the main cagra.hpp in this PR? It should be a straightforward copy/paste... just trying to keep the number of headers manageable. The whole benefit to no longer being header only is that the headers are super super lightweight.

0.006,
min_recall));

/* TODO: eval_distances doesn't work, potentially because of id translation mismatch
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, This needs either fixed (by adding id translations) - or removed before this PR should be merged

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Dec 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@benfred benfred changed the base branch from main to release/25.12 December 1, 2025 20:52
@benfred benfred marked this pull request as ready for review December 1, 2025 20:53
@benfred benfred requested review from a team as code owners December 1, 2025 20:53
@benfred benfred requested review from a team as code owners January 6, 2026 00:45
@benfred benfred removed request for a team and gforsyth January 6, 2026 00:49
Copy link
Copy Markdown
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently breaks ABI stability between 25.12 and 26.02

If we are planning on getting this into 26.02 as is we will need to bump our ABI. If we want to keep ABI compat we need a cuvsCagraMerge_v2

@cjnolet cjnolet changed the base branch from main to release/26.02 January 29, 2026 03:14
@cjnolet cjnolet changed the base branch from release/26.02 to main January 29, 2026 03:15
@benfred
Copy link
Copy Markdown
Contributor Author

benfred commented Jan 29, 2026

/merge

@rapids-bot rapids-bot Bot merged commit a883a25 into rapidsai:main Jan 29, 2026
180 of 193 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Jan 29, 2026
benfred added a commit to benfred/cuvs that referenced this pull request Jan 29, 2026
Add an optional filter argument to cagra::merge, so that we can compact the index to remove deleted items when merging.

Also remove the 'LOGICAL_MERGE' strategy from the `cagra::merge` api, since this is only supported with the composite merge api. (The `cagra::merge` function ignore this value if set, and only does a logical merge).

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#1496
@benfred benfred deleted the filtered_merge branch January 29, 2026 23:37
@robertmaynard
Copy link
Copy Markdown
Contributor

@benfred We are going to have to open a new PR and introduce cuvsCagraMerge_v2 with this new signature as this breaks ABI between 26.02 and 26.04

@benfred
Copy link
Copy Markdown
Contributor Author

benfred commented Feb 26, 2026

@robertmaynard this change is in 26.02 - I messed up the merge on this pr, but we merged into 26.02 with #1755

@robertmaynard
Copy link
Copy Markdown
Contributor

@robertmaynard this change is in 26.02 - I messed up the merge on this pr, but we merged into 26.02 with #1755

@benfred That is great news! Thanks

rapids-bot Bot pushed a commit that referenced this pull request Mar 24, 2026
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change.

Currently the following checks are made and flagged by this tool:

* Functions that have been removed from the C-ABI
* Functions that have extra parameters added
* Functions that have parameters removed
* Functions that have the type of any parameter changed
* Structs that have been removed
* Structs that have have members removed
* Structs that have the types of members changed
* Enum values that have been removed
* Enum values that have their value changed

This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet.

 To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes.

When run on a branch that has a breaking ABI change (#1496 which has a number of breaking c-abi changes), the  output is:

```
$ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include 
Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584
Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587
Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576
```

Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook

closes #1739

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #1749
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change.

Currently the following checks are made and flagged by this tool:

* Functions that have been removed from the C-ABI
* Functions that have extra parameters added
* Functions that have parameters removed
* Functions that have the type of any parameter changed
* Structs that have been removed
* Structs that have have members removed
* Structs that have the types of members changed
* Enum values that have been removed
* Enum values that have their value changed

This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet.

 To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes.

When run on a branch that has a breaking ABI change (rapidsai#1496 which has a number of breaking c-abi changes), the  output is:

```
$ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include 
Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584
Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587
Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576
```

Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook

closes rapidsai#1739

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: rapidsai#1749
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change.

Currently the following checks are made and flagged by this tool:

* Functions that have been removed from the C-ABI
* Functions that have extra parameters added
* Functions that have parameters removed
* Functions that have the type of any parameter changed
* Structs that have been removed
* Structs that have have members removed
* Structs that have the types of members changed
* Enum values that have been removed
* Enum values that have their value changed

This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet.

 To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes.

When run on a branch that has a breaking ABI change (rapidsai#1496 which has a number of breaking c-abi changes), the  output is:

```
$ ./check_c_abi.py ~/code/cuvs_stable_abi/c/include ~/code/cuvs/c/include 
Error: Function has been removed. Symbol cuvsCagraMergeParamsCreate from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:584
Error: Function has been removed. Symbol cuvsCagraMergeParamsDestroy from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:587
Error: Function has a changed argument type 'cuvsCagraMergeParams_t' to 'cuvsCagraIndexParams_t for argument 'params'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a changed argument type 'cuvsCagraIndex_t' to 'cuvsFilter for argument 'output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Function has a new argument 'cuvsCagraIndex_t output_index'. Symbol cuvsCagraMerge from /home/ben/code/cuvs/c/include/cuvs/neighbors/cagra.h:882
Error: Struct has been removed. Symbol cuvsCagraMergeParams from /home/ben/code/cuvs_stable_abi/c/include/cuvs/neighbors/cagra.h:576
```

Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook

closes rapidsai#1739

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: rapidsai#1749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants