Skip to content

Commit 15f1943

Browse files
author
Carl Love
committed
Unroll loop in lookup_2_lanes
The current loop in function lookup_2_lanes infais/utils/simdlib_emulated.h goes from 0 to 31. It has an if statement to do an assignment for j < 16 and a different assignment for j >= 16. By unrolling the loop to do the j < 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and the number of loop iterations is reduced in half. Then unroll the loop for the j < 16 and the j >=16 to a depth of 2. This change results in approximately a 55% reduction in the execution time for the bench_ivf_fastscan.py workload on Power 10 when compiled with CMAKE_INSTALL_CONFIG_NAME=Release. The removal of the if (j < 16) statement and the unrolling of the loop removes branch cycle stall and register dependencies on instruction issue. The result is the unrolled code is able issue instructions earlier thus reducing the total number of cycles required to execute the function. This patch makes a copy of faiss/utils/simdlib_emulated.h and names it faiss/utils/simdlib_emulated_ppc64.h. The new file has the new version of lookup_2_lanes. The new included file is used if the define __PPC64__ is set by the GCC or XLC clang compiler.
1 parent 252ae16 commit 15f1943

2 files changed

Lines changed: 1091 additions & 0 deletions

File tree

faiss/utils/simdlib_emulated.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55
* LICENSE file in the root directory of this source tree.
66
*/
77

8+
#if defined(__PPC64__)
9+
10+
#include <faiss/utils/simdlib_emulated_ppc64.h>
11+
12+
#else
13+
814
#pragma once
915

1016
#include <algorithm>
@@ -1043,3 +1049,4 @@ inline void cmplt_min_max_fast(
10431049
} // namespace
10441050

10451051
} // namespace faiss
1052+
#endif

0 commit comments

Comments
 (0)