feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression#45830
Conversation
Signed-off-by: SpadeA <[email protected]>
|
[ci-v2-notice]
To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
|
@SpadeA-Tang go-sdk check failed, comment |
|
@SpadeA-Tang cpu-e2e job failed, comment |
b8c0742 to
cc5ba77
Compare
Signed-off-by: SpadeA <[email protected]>
cc5ba77 to
3a8c0c8
Compare
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
|
@SpadeA-Tang cpu-e2e job failed, comment |
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
|
@SpadeA-Tang cpu-e2e job failed, comment |
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #45830 +/- ##
==========================================
+ Coverage 73.29% 76.00% +2.70%
==========================================
Files 1369 1905 +536
Lines 213893 298568 +84675
==========================================
+ Hits 156778 226913 +70135
- Misses 49633 64160 +14527
- Partials 7482 7495 +13
🚀 New features to boost your workflow:
|
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
| const FieldMeta& | ||
| Schema::GetFirstArrayFieldInStruct(const std::string& struct_name) const { | ||
| // Check cache first | ||
| auto cache_it = struct_array_field_cache_.find(struct_name); |
There was a problem hiding this comment.
I was thinking of caching it in constructor. using this method we need to worry about concurrency safty.
There was a problem hiding this comment.
Schema is is inited by default constructor and add fields one by one. So it's unable to handle this in constructor. I think add a mutext for struct_array_field_cache_ is fine.
Signed-off-by: SpadeA <[email protected]>
| BuildFromSegment(const void* segment, const FieldMeta& field_meta); | ||
|
|
||
| private: | ||
| const std::vector<int32_t> element_row_ids_; |
There was a problem hiding this comment.
remember to report this portion of memory usage to cachinglayer. reach out to @sparknack for details
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
|
For the resource management section, LGTM. |
Signed-off-by: SpadeA <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SpadeA-Tang, zhengbuqian The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
|
@SpadeA-Tang go-sdk check failed, comment |
Signed-off-by: SpadeA <[email protected]>
Signed-off-by: SpadeA <[email protected]>
|
/ci-rerun-e2e-default |
Signed-off-by: SpadeA <[email protected]>
|
/ci-rerun-e2e-default |
|
/lgtm |
issue: #42148
For a vector field inside a STRUCT, since a STRUCT can only appear as the element type of an ARRAY field, the vector field in STRUCT is effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose names start with the prefix MAX_SIM_.
This PR allows Milvus to search embeddings inside an embedding list using the same metrics as normal embedding fields. Each embedding in the list is treated as an independent vector and participates in ANN search.
Further, since STRUCT may contain scalar fields that are highly related to the embedding field, this PR introduces an element-level filter expression to refine search results.
The grammar of the element-level filter is:
element_filter(structFieldName, $[subFieldName] == 3)
where $[subFieldName] refers to the value of subFieldName in each element of the STRUCT array structFieldName.
It can be combined with existing filter expressions, for example:
"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3)"
A full example:
TODO:
element_filterexpression is used, a regular filter expression must also be present. Remove this restriction.element_filterexpressions in thequery.