Skip to content

SNMG ANN build with OpenMP nested parallelism#1526

Merged
rapids-bot[bot] merged 8 commits intorapidsai:release/25.12from
viclafargue:snmg-ann-build-openmp-nested-parallelism
Dec 3, 2025
Merged

SNMG ANN build with OpenMP nested parallelism#1526
rapids-bot[bot] merged 8 commits intorapidsai:release/25.12from
viclafargue:snmg-ann-build-openmp-nested-parallelism

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

The SNMG ANN build spawns threads with #pragma omp parallel for, but CAGRA build also uses OpenMP internally. With nested parallelism disabled by default, CAGRA's internal parallel regions ran serially on a single thread, causing severe performance degradation.

  • Enable nested parallelism with omp_set_nested(1)
  • Limit outer loop to num_ranks threads (one per GPU)
  • Inside each rank thread, allocate threads_per_rank for CAGRA's internal parallelism
  • Restore original thread count after parallel region

@viclafargue viclafargue requested a review from a team as a code owner November 10, 2025 16:15
@viclafargue viclafargue changed the title SNMG ANN build OpenMP nested parallelism SNMG ANN build with OpenMP nested parallelism Nov 10, 2025
Comment thread cpp/src/neighbors/mg/snmg.cuh Outdated
Comment thread cpp/src/neighbors/mg/snmg.cuh
Comment thread cpp/src/neighbors/mg/snmg.cuh Outdated
@viclafargue viclafargue added bug Something isn't working non-breaking Introduces a non-breaking change labels Nov 11, 2025
Comment thread cpp/src/neighbors/mg/snmg.cuh Outdated
Comment thread cpp/src/core/omp_wrapper.cpp
Comment thread cpp/src/neighbors/mg/snmg.cuh
Comment thread cpp/src/neighbors/mg/snmg.cuh Outdated
Comment thread cpp/src/neighbors/mg/snmg.cuh Outdated
}

// Restore original thread count
cuvs::core::omp::set_num_threads(saved_omp_threads);
Copy link
Copy Markdown
Contributor

@tarang-jain tarang-jain Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this restoration is a bit confusing. the pool of threads is set inside the for loop above, but its restored after the for loop? Can you verify if this logic is correct?

Copy link
Copy Markdown
Contributor Author

@viclafargue viclafargue Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is actually not necessary. My worry was that the main thread could be one of the threads in the loop. But, after checking it appears like OpenMP is designed in such a way that omp_set_num_threads calls inside of a parallel region only affect how many threads each thread can use in nested parallel region. The main thread is unaffected, and its number of threads is actually preserved. I made sure that each thread sees its number of thread restored back to the global maximal number of threads within the parallel region.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the same OpenMP usage for the extend function.

Comment thread cpp/src/neighbors/mg/snmg.cuh
Comment thread cpp/src/neighbors/mg/snmg.cuh
@viclafargue viclafargue changed the base branch from main to release/25.12 November 17, 2025 17:10
@viclafargue
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 4fa9ab4 into rapidsai:release/25.12 Dec 3, 2025
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants