SNMG ANN build with OpenMP nested parallelism#1526
Conversation
| } | ||
|
|
||
| // Restore original thread count | ||
| cuvs::core::omp::set_num_threads(saved_omp_threads); |
There was a problem hiding this comment.
this restoration is a bit confusing. the pool of threads is set inside the for loop above, but its restored after the for loop? Can you verify if this logic is correct?
There was a problem hiding this comment.
I agree. This is actually not necessary. My worry was that the main thread could be one of the threads in the loop. But, after checking it appears like OpenMP is designed in such a way that I made sure that each thread sees its number of thread restored back to the global maximal number of threads within the parallel region.omp_set_num_threads calls inside of a parallel region only affect how many threads each thread can use in nested parallel region. The main thread is unaffected, and its number of threads is actually preserved.
There was a problem hiding this comment.
I also added the same OpenMP usage for the extend function.
|
/merge |
The SNMG ANN build spawns threads with
#pragma omp parallel for, but CAGRA build also uses OpenMP internally. With nested parallelism disabled by default, CAGRA's internal parallel regions ran serially on a single thread, causing severe performance degradation.omp_set_nested(1)num_ranksthreads (one per GPU)threads_per_rankfor CAGRA's internal parallelism