-
Notifications
You must be signed in to change notification settings - Fork 404
Description
Description
PR #3268 fixed the insert path for non-float vectors (Float16, BFloat16, Int8, Binary) in struct arrays. However, the search path still has issues — EmbeddingList cannot be constructed with non-float vector data, and the binary EmbeddingList placeholder type is never used.
How to reproduce
from pymilvus import EmbeddingList
import numpy as np
# Float16 vector stored as bytes (as returned by entity helper)
vec = np.random.rand(128).astype(np.float16)
vec_bytes = vec.tobytes()
emb_list = EmbeddingList(dtype=np.float16)
emb_list.add(vec_bytes) # ValueError: Embedding must be 1D, got shape ()Root Cause
Issue 1: EmbeddingList.add() doesn't handle bytes input
In embedding_list.py:
def add(self, embedding):
embedding = np.asarray(embedding) # bytes → 0-D ndarray (shape=())
if embedding.ndim != 1: # 0 != 1 → ValueError
raise ValueError(f"Embedding must be 1D, got shape {embedding.shape}")For non-float vectors, the entity helper stores vectors as bytes. np.asarray(bytes) produces a 0-D ndarray, causing the dimension check to fail.
Suggested fix: Handle bytes input by converting to numpy array with the appropriate dtype:
if isinstance(embedding, bytes):
if self._dtype is not None:
embedding = np.frombuffer(embedding, dtype=self._dtype)
else:
raise ValueError("Cannot add bytes embedding without dtype specified")Issue 2: _prepare_placeholder_str missing binary EmbeddingList branch
In prepare.py:
elif dtype == "byte":
pl_type = PlaceholderType.BinaryVector # Missing is_embedding_list check!
pl_values = dataAnd:
elif isinstance(data[0], bytes):
pl_type = PlaceholderType.BinaryVector # Missing is_embedding_list check!
pl_values = dataBoth branches don't check is_embedding_list, so PlaceholderType.EmbListBinaryVector (already defined as 300 in types.py) is never used.
Server-side verification
Milvus server correctly handles non-float vector search when data is properly serialized:
| Scenario | Result |
|---|---|
| element_filter + float16 ndarray + L2 metric | ✅ |
| MAX_SIM + float16 EmbeddingList (manually constructed) + MAX_SIM_L2 | ✅ |
No server-side changes needed — this is a pymilvus-only fix.
Related
- PR fix: support non-float vectors in struct array #3268 (insert path fix, merged)
- 8 struct array element search test cases are
xfaildue to this issue