Skip to content

Prevent nested parallelism in HNSW bench#1895

Merged
rapids-bot[bot] merged 1 commit intorapidsai:release/26.04from
julianmi:hnswlib-bench-threading
Mar 24, 2026
Merged

Prevent nested parallelism in HNSW bench#1895
rapids-bot[bot] merged 1 commit intorapidsai:release/26.04from
julianmi:hnswlib-bench-threading

Conversation

@julianmi
Copy link
Copy Markdown
Contributor

@julianmi julianmi commented Mar 9, 2026

Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. This patch proposes to either use throughput mode using multiple gbench threads or latency mode using batch parallelism. Additionally, there is a significant overhead in going through the thread pool. It is skipped in the search method to handle single query batch size efficiently.

@julianmi julianmi requested a review from a team as a code owner March 9, 2026 14:18
@aamijar aamijar moved this to In Progress in Unstructured Data Processing Mar 9, 2026
@aamijar aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Mar 9, 2026
Copy link
Copy Markdown
Member

@aamijar aamijar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @julianmi, what is the UX for using multiple threads in HSNW bench? Does the user set the gbench threads parameter, or the num_threads_ parameter?

@achirkin
Copy link
Copy Markdown
Contributor

To answer @aamijar

In the latency mode, gbench measures how long does it take to execute a single search call for the given algorithm and batch size. In this mode, gbench is always single-threaded. To make the use of the whole CPU, HNSW has its own threading logic. This makes the HNSW measures more realistic and fair against GPU algorithms.

In the throughput mode, gbench measures how many requests can the given algorithm serve per second. Thus, gbench provides independent threads to do the search calls. This clashes with the internal HNSW threading. Because gbench creates its threads and manages batching outside the measured benchmark loop, the performance of HNSW generally looks better with gbench threads than with the internal threads. Hence we just disable internal batching completely in the throughput mode.

@tfeher tfeher changed the base branch from main to release/26.04 March 24, 2026 08:43
@tfeher tfeher requested review from a team as code owners March 24, 2026 08:43
@tfeher tfeher requested a review from msarahan March 24, 2026 08:43
- Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. Force either throughput mode using multiple gbench threads or latency mode using batch paralleism.
- Added a check in `search` method to handle single query batch size efficiently. There is a significant overhead in going throught he thread pool.
@julianmi julianmi force-pushed the hnswlib-bench-threading branch from 11d0fc8 to 641227a Compare March 24, 2026 08:58
@tfeher tfeher removed request for a team and msarahan March 24, 2026 10:13
@achirkin
Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit c1aa5f3 into rapidsai:release/26.04 Mar 24, 2026
150 of 153 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Mar 24, 2026
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. This patch proposes to either use throughput mode using multiple gbench threads or latency mode using batch parallelism. Additionally, there is a significant overhead in going through the thread pool. It is skipped in the `search` method to handle single query batch size efficiently.

Authors:
  - Julian Miller (https://github.com/julianmi)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1895
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. This patch proposes to either use throughput mode using multiple gbench threads or latency mode using batch parallelism. Additionally, there is a significant overhead in going through the thread pool. It is skipped in the `search` method to handle single query batch size efficiently.

Authors:
  - Julian Miller (https://github.com/julianmi)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants