Skip to content

bug: EmbeddingList search doesn't support non-float vectors in struct array #3269

@zhuwenxing

Description

@zhuwenxing

Description

PR #3268 fixed the insert path for non-float vectors (Float16, BFloat16, Int8, Binary) in struct arrays. However, the search path still has issues — EmbeddingList cannot be constructed with non-float vector data, and the binary EmbeddingList placeholder type is never used.

How to reproduce

from pymilvus import EmbeddingList
import numpy as np

# Float16 vector stored as bytes (as returned by entity helper)
vec = np.random.rand(128).astype(np.float16)
vec_bytes = vec.tobytes()

emb_list = EmbeddingList(dtype=np.float16)
emb_list.add(vec_bytes)  # ValueError: Embedding must be 1D, got shape ()

Root Cause

Issue 1: EmbeddingList.add() doesn't handle bytes input

In embedding_list.py:

def add(self, embedding):
    embedding = np.asarray(embedding)  # bytes → 0-D ndarray (shape=())
    if embedding.ndim != 1:            # 0 != 1 → ValueError
        raise ValueError(f"Embedding must be 1D, got shape {embedding.shape}")

For non-float vectors, the entity helper stores vectors as bytes. np.asarray(bytes) produces a 0-D ndarray, causing the dimension check to fail.

Suggested fix: Handle bytes input by converting to numpy array with the appropriate dtype:

if isinstance(embedding, bytes):
    if self._dtype is not None:
        embedding = np.frombuffer(embedding, dtype=self._dtype)
    else:
        raise ValueError("Cannot add bytes embedding without dtype specified")

Issue 2: _prepare_placeholder_str missing binary EmbeddingList branch

In prepare.py:

elif dtype == "byte":
    pl_type = PlaceholderType.BinaryVector  # Missing is_embedding_list check!
    pl_values = data

And:

elif isinstance(data[0], bytes):
    pl_type = PlaceholderType.BinaryVector  # Missing is_embedding_list check!
    pl_values = data

Both branches don't check is_embedding_list, so PlaceholderType.EmbListBinaryVector (already defined as 300 in types.py) is never used.

Server-side verification

Milvus server correctly handles non-float vector search when data is properly serialized:

Scenario Result
element_filter + float16 ndarray + L2 metric
MAX_SIM + float16 EmbeddingList (manually constructed) + MAX_SIM_L2

No server-side changes needed — this is a pymilvus-only fix.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions