Is your feature request related to a problem? Please describe.
In pylibraft, ivf_pq supported host dataset for build and extend, and had a corresponding unit test.
The support of host input arrays is removed in cuvs.
When I try to past host dataset, I get the following error:
import numpy as np
from cuvs.neighbors import ivf_pq,
dataset = np.random.uniform(-1,1, size=(1000,32)).astype(np.float32)
build_params = ivf_pq.IndexParams(metric="sqeuclidean", pq_dim=16, pq_bits=8, n_lists=32)
index = ivf_pq.build(build_params, dataset)
---------------------------------------------------------------------------
CuvsException Traceback (most recent call last)
Cell In[21], line 3
1 dataset = np.random.uniform(-1,1, size=(1000,32)).astype(np.float32)
2 build_params = ivf_pq.IndexParams(metric="sqeuclidean", pq_dim=16, pq_bits=8, n_lists=32)
----> 3 index = ivf_pq.build(build_params, dataset)
File resources.pyx:110, in cuvs.common.resources.auto_sync_resources.wrapper()
File ivf_pq.pyx:293, in cuvs.neighbors.ivf_pq.ivf_pq.build()
File ivf_pq.pyx:294, in cuvs.neighbors.ivf_pq.ivf_pq.build()
File exceptions.pyx:37, in cuvs.common.exceptions.check_cuvs()
CuvsException: RAFT failure at file=/home/scratch.tfeher_gpu_2/cuvs_2506/cpp/include/cuvs/core/detail/interop.hpp line=92: device_type mismatch between return mdspan and DLTensor
Describe the solution you'd like
Support host input dataset for ivf_pq.build and ivf_pq.extend.
Supporting host input is important to index large dataset. Our implementation only requires that the PQ compressed dataset need to fit the GPU memory. Therefore it would be important that we support host arrays in the Python API.
Is your feature request related to a problem? Please describe.
In pylibraft,
ivf_pqsupported host dataset forbuildandextend, and had a corresponding unit test.The support of host input arrays is removed in cuvs.
When I try to past host dataset, I get the following error:
Describe the solution you'd like
Support host input dataset for
ivf_pq.buildandivf_pq.extend.Supporting host input is important to index large dataset. Our implementation only requires that the PQ compressed dataset need to fit the GPU memory. Therefore it would be important that we support host arrays in the Python API.