Skip to content

Removing from IndexIVFFlat with a IndexIDMap2 wrapper fails #4535

@dbiton

Description

@dbiton

Hello,
In Python 3.12.5, with faiss-cpu 1.11.0.post1, I have the following scheme:

# Generate some embeds with SBERT...
dim = 384
count_cells = 10 
coarse_quantizer = faiss.IndexFlatL2(dim)
ivf_index = faiss.IndexIVFFlat(coarse_quantizer, dim, nlist)
train_embeds = embeds[:5000]
ivf_index.train(train_embeds)
index = faiss.IndexIDMap2(ivf_index)

Afterwards, I have a loop where I add a few embeds each time like this:

# embeds - a list of sbert embeds of dim 384, embeds_ids - their unique ids
self.index.add_with_ids(embeds, embeds_ids)

The ids are for sure unique - I use the index of the embeds, and iterate over the embeds.
Once the index reaches a certain size, I start removing embeds by ids - not in a sliding window fashion.

I added some safety code as advised by an LLM, although note that my code works fine with a naive index, i.e. faiss.IndexIDMap2(faiss.IndexFlatL2(dim)), by just calling remove_ids with the array of ids.

ids64 = np.ascontiguousarray(np.asarray(embeds_ids, dtype=np.int64))

sel = faiss.IDSelectorBatch(ids64.size, faiss.swig_ptr(ids64))
before = self.index.ntotal
removed = int(self.index.remove_ids(sel))
after = self.index.ntotal

assert after == before - len(embeds_ids), (before, after)
return removed

This function can remove a single ID 3 times, passing the assertion, and than gets the following assertion failure (btw - throwing an exception would be nice, unless there's some good reason to use an assertion instead).

Faiss assertion 'j == index->ntotal' failed in virtual size_t __cdecl faiss::IndexIDMapTemplate<faiss::Index>::remove_ids(const IDSelector &) [IndexT = faiss::Index] at D:\a\faiss-wheels\faiss-wheels\faiss\faiss\IndexIDMap.cpp:197

I've been troubleshooting for quite some time, and I suspect there might be a bug - since I think (hope) my usage is fine.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions