Skip to content

[FEA] IVF-PQ Python API support host dataset for build and extend #858

@tfeher

Description

@tfeher

Is your feature request related to a problem? Please describe.
In pylibraft, ivf_pq supported host dataset for build and extend, and had a corresponding unit test.

The support of host input arrays is removed in cuvs.

When I try to past host dataset, I get the following error:

import numpy as np
from cuvs.neighbors import ivf_pq,
dataset = np.random.uniform(-1,1, size=(1000,32)).astype(np.float32)
build_params = ivf_pq.IndexParams(metric="sqeuclidean", pq_dim=16, pq_bits=8, n_lists=32)
index = ivf_pq.build(build_params, dataset)

---------------------------------------------------------------------------
CuvsException                             Traceback (most recent call last)
Cell In[21], line 3
      1 dataset = np.random.uniform(-1,1, size=(1000,32)).astype(np.float32)
      2 build_params = ivf_pq.IndexParams(metric="sqeuclidean", pq_dim=16, pq_bits=8, n_lists=32)
----> 3 index = ivf_pq.build(build_params, dataset)

File resources.pyx:110, in cuvs.common.resources.auto_sync_resources.wrapper()

File ivf_pq.pyx:293, in cuvs.neighbors.ivf_pq.ivf_pq.build()

File ivf_pq.pyx:294, in cuvs.neighbors.ivf_pq.ivf_pq.build()

File exceptions.pyx:37, in cuvs.common.exceptions.check_cuvs()

CuvsException: RAFT failure at file=/home/scratch.tfeher_gpu_2/cuvs_2506/cpp/include/cuvs/core/detail/interop.hpp line=92: device_type mismatch between return mdspan and DLTensor

Describe the solution you'd like
Support host input dataset for ivf_pq.build and ivf_pq.extend.

Supporting host input is important to index large dataset. Our implementation only requires that the PQ compressed dataset need to fit the GPU memory. Therefore it would be important that we support host arrays in the Python API.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions