Commit 15f1943
Carl Love
Unroll loop in lookup_2_lanes
The current loop in function lookup_2_lanes infais/utils/simdlib_emulated.h
goes from 0 to 31. It has an if statement to do an assignment for j < 16
and a different assignment for j >= 16. By unrolling the loop to do the j
< 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and
the number of loop iterations is reduced in half.
Then unroll the loop for the j < 16 and the j >=16 to a depth of 2.
This change results in approximately a 55% reduction in the execution time
for the bench_ivf_fastscan.py workload on Power 10 when compiled with
CMAKE_INSTALL_CONFIG_NAME=Release.
The removal of the if (j < 16) statement and the unrolling of the loop
removes branch cycle stall and register dependencies on instruction issue.
The result is the unrolled code is able issue instructions earlier thus
reducing the total number of cycles required to execute the function.
This patch makes a copy of faiss/utils/simdlib_emulated.h and names it
faiss/utils/simdlib_emulated_ppc64.h. The new file has the new version
of lookup_2_lanes. The new included file is used if the define __PPC64__
is set by the GCC or XLC clang compiler.1 parent 252ae16 commit 15f1943
2 files changed
Lines changed: 1091 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
8 | 14 | | |
9 | 15 | | |
10 | 16 | | |
| |||
1043 | 1049 | | |
1044 | 1050 | | |
1045 | 1051 | | |
| 1052 | + | |
0 commit comments