You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -99,7 +89,7 @@ The following command pulls the nightly container for Python version 3.10, CUDA
99
89
100
90
.. code-block:: bash
101
91
102
-
docker pull rapidsai/cuvs-bench:24.12a-cuda12.5-py3.10 #substitute cuvs-bench for the exact desired container.
92
+
docker pull rapidsai/cuvs-bench:24.12a-cuda12.5-py3.10 #substitute cuvs-bench for the exact desired container.
103
93
104
94
The CUDA and python versions can be changed for the supported values:
105
95
- Supported CUDA versions: 11.8 and 12.5
@@ -112,185 +102,6 @@ You can see the exact versions as well in the dockerhub site:
112
102
113
103
**Note:** GPU containers use the CUDA toolkit from inside the container, the only requirement is a driver installed on the host machine that supports that version. So, for example, CUDA 11.8 containers can run in systems with a CUDA 12.x capable driver. Please also note that the Nvidia-Docker runtime from the `Nvidia Container Toolkit <https://github.com/NVIDIA/nvidia-docker>`_ is required to use GPUs inside docker containers.
114
104
115
-
How benchmarks are run
116
-
======================
117
-
118
-
The `cuvs-bench` package contains lightweight Python scripts to run the benchmarks. There are 4 general steps to running the benchmarks and visualizing the results.
119
-
120
-
#. Prepare Dataset
121
-
122
-
#. Build Index and Search Index
123
-
124
-
#. Data Export
125
-
126
-
#. Plot Results
127
-
128
-
Step 1: Prepare the dataset
129
-
---------------------------
130
-
131
-
The script `cuvs_bench.get_dataset` will download and unpack the dataset in directory that the user provides. As of now, only million-scale datasets are supported by this script. For more information on :doc:`datasets and formats <datasets>`.
the number of subset rows of the dataset to build the index (default: None)
167
-
-k COUNT, --count COUNT
168
-
the number of nearest neighbors to search for (default: 10)
169
-
-bs BATCH_SIZE, --batch-size BATCH_SIZE
170
-
number of query vectors to use in each query trial (default: 10000)
171
-
--dataset-configuration DATASET_CONFIGURATION
172
-
path to YAML configuration file for datasets (default: None)
173
-
--configuration CONFIGURATION
174
-
path to YAML configuration file or directory foralgorithms Any run groups foundin the specified file/directory will automatically override groups of the same name
175
-
present in the default configurations, including `base` (default: None)
176
-
--dataset DATASET name of dataset (default: glove-100-inner)
177
-
--dataset-path DATASET_PATH
178
-
path to dataset folder, by default will look in RAPIDS_DATASET_ROOT_DIR if defined, otherwise a datasets subdirectory from the calling directory (default:
179
-
os.getcwd()/datasets/)
180
-
--build
181
-
--search
182
-
--algorithms ALGORITHMS
183
-
run only comma separated list of named algorithms. If parameters `groups` and `algo-groups` are both undefined, then group `base` is run by default (default: None)
184
-
--groups GROUPS run only comma separated groups of parameters (default: base)
185
-
--algo-groups ALGO_GROUPS
186
-
add comma separated <algorithm>.<group> to run. Example usage: "--algo-groups=cuvs_cagra.large,hnswlib.large" (default: None)
187
-
-f, --force re-run algorithms even if their results already exist (default: False)
188
-
-m SEARCH_MODE, --search-mode SEARCH_MODE
189
-
run search in'latency' (measure individual batches) or 'throughput' (pipeline batches and measure end-to-end) mode (default: throughput)
specify the number threads to use for throughput benchmark. Single value or a pair of min and max separated by ':'. Example --search-threads=1:4. Power of 2 values between 'min' and 'max' will be used. If only 'min' is
192
-
specified, then a single test is run with 'min' threads. By default min=1, max=<num hyper threads>. (default: None)
193
-
-r, --dry-run dry-run mode will convert the yaml config for the specified algorithms and datasets to the json format that's consumed by the lower-level c++ binaries and then print the command to run execute the benchmarks but
194
-
will not actually execute the command. (default: False)
195
-
196
-
`dataset`: name of the dataset to be searched in `datasets.yaml`_
197
-
198
-
`dataset-configuration`: optional filepath to custom dataset YAML config which has an entry for arg `dataset`
199
-
200
-
`configuration`: optional filepath to YAML configuration for an algorithm or to directory that contains YAML configurations for several algorithms. Refer to `Dataset.yaml config`_ for more info.
201
-
202
-
`algorithms`: runs all algorithms that it can find in YAML configs found by `configuration`. By default, only `base` group will be run.
203
-
204
-
`groups`: run only specific groups of parameters configurations for an algorithm. Groups are defined in YAML configs (see `configuration`), and by default run `base` group
205
-
206
-
`algo-groups`: this parameter is helpful to append any specific algorithm+group combination to run the benchmark for in addition to all the arguments from `algorithms` and `groups`. It is of the format `<algorithm>.<group>`, or for example, `cuvs_cagra.large`
207
-
208
-
For every algorithm run by this script, it outputs an index build statistics JSON file in `<dataset-path/<dataset>/result/build/<{algo},{group}.json>`
209
-
and an index search statistics JSON file in `<dataset-path/<dataset>/result/search/<{algo},{group},k{k},bs{batch_size}.json>`. NOTE: The filenames will not have ",{group}" if `group = "base"`.
210
-
211
-
For every algorithm run by this script, it outputs an index build statistics JSON file in `<dataset-path/<dataset>/result/build/<{algo},{group}.json>`
212
-
and an index search statistics JSON file in `<dataset-path/<dataset>/result/search/<{algo},{group},k{k},bs{batch_size}.json>`. NOTE: The filenames will not have ",{group}" if `group = "base"`.
213
-
214
-
`dataset-path` :
215
-
#. data is read from `<dataset-path>/<dataset>`
216
-
#. indices are built in `<dataset-path>/<dataset>/index`
217
-
#. build/search results are stored in `<dataset-path>/<dataset>/result`
218
-
219
-
`build` and `search` : if both parameters are not supplied to the script then it is assumed both are `True`.
220
-
221
-
`indices` and `algorithms` : these parameters ensure that the algorithm specified for an index is available in `algos.yaml` and not disabled, as well as having an associated executable.
222
-
223
-
Step 3: Data export
224
-
-------------------
225
-
226
-
The script `cuvs_bench.data_export` will convert the intermediate JSON outputs produced by `cuvs_bench.run` to more easily readable CSV files, which are needed to build charts made by `cuvs_bench.plot`.
--dataset DATASET dataset to download (default: glove-100-inner)
235
-
--dataset-path DATASET_PATH
236
-
path to dataset folder (default: ${RAPIDS_DATASET_ROOT_DIR})
237
-
238
-
Build statistics CSV file is stored in `<dataset-path/<dataset>/result/build/<{algo},{group}.csv>`
239
-
and index search statistics CSV file in `<dataset-path/<dataset>/result/search/<{algo},{group},k{k},bs{batch_size},{suffix}.csv>`, where suffix has three values:
240
-
#. `raw`: All search results are exported
241
-
#. `throughput`: Pareto frontier of throughput results is exported
242
-
#. `latency`: Pareto frontier of latency results is exported
243
-
244
-
Step 4: Plot the results
245
-
------------------------
246
-
247
-
The script `cuvs_bench.plot` will plot results for all algorithms found in index search statistics CSV files `<dataset-path/<dataset>/result/search/*.csv`.
add comma separated <algorithm>.<group> to plot. Example usage: "--algo-groups=cuvs_cagra.large,hnswlib.large" (default: None)
270
-
-k COUNT, --count COUNT
271
-
the number of nearest neighbors to search for (default: 10)
272
-
-bs BATCH_SIZE, --batch-size BATCH_SIZE
273
-
number of query vectors to use in each query trial (default: 10000)
274
-
--build
275
-
--search
276
-
--x-scale X_SCALE Scale to use when drawing the X-axis. Typically linear, logit or a2 (default: linear)
277
-
--y-scale {linear,log,symlog,logit}
278
-
Scale to use when drawing the Y-axis (default: linear)
279
-
--x-start X_START Recall values to start the x-axis from (default: 0.8)
280
-
--mode {throughput,latency}
281
-
search mode whose Pareto frontier is used on the y-axis (default: throughput)
282
-
--time-unit {s,ms,us}
283
-
time unit to plot when mode is latency (default: ms)
284
-
--raw Show raw results (not just Pareto frontier) of mode arg (default: False)
285
-
286
-
`mode`: plots pareto frontier of `throughput` or `latency` results exported in the previous step
287
-
288
-
`algorithms`: plots all algorithms that it can find results for the specified `dataset`. By default, only `base` group will be plotted.
289
-
290
-
`groups`: plot only specific groups of parameters configurations for an algorithm. Groups are defined in YAML configs (see `configuration`), and by default run `base` group
291
-
292
-
`algo-groups`: this parameter is helpful to append any specific algorithm+group combination to plot results for in addition to all the arguments from `algorithms` and `groups`. It is of the format `<algorithm>.<group>`, or for example, `cuvs_cagra.large`
293
-
294
105
Running the benchmarks
295
106
======================
296
107
@@ -576,7 +387,7 @@ Creating and customizing dataset configurations
576
387
577
388
A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations.
578
389
579
-
A default `datasets.yaml` is provided by CUVS in`${CUVS_HOME}/python/cuvs-ann-bench/src/cuvs_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
390
+
A default `datasets.yaml` is provided by CUVS in `${CUVS_HOME}/python/cuvs_bench/src/cuvs_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
0 commit comments