Commit 4b1f49d
Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512 (#2568)
Summary:
Pull Request resolved: #2568
Add a fused kernel for exhaustive_L2sqr_blas() call that combines a computation of dot product and the search for the nearest centroid. As a result, no temporary dot product values are written and read in RAM.
Speeds up the training of PQx[1] indices for dsub = 1, 2, 4, 8, and the effect is higher for higher values of [1]. AVX512 version provides additional overloads for dsub = 12, 16.
The speedup is also beneficial for higher values of pq.cp.max_points_per_centroid (which is 256 by default).
Speeds up IVFPQ training as well.
AVX512 kernel is not enabled, but I've seen it speeding up the training TWICE versus AVX2 version. So, please feel free to use it by enabling AVX512 manually.
Reviewed By: mdouze
Differential Revision: D41166766
fbshipit-source-id: 2d15c8f73e46c8d86d3b9f632ed91c21386deeea1 parent ab13122 commit 4b1f49d
File tree
13 files changed
+1217
-5
lines changed- faiss
- utils
- distances_fused
- tests
13 files changed
+1217
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| |||
187 | 190 | | |
188 | 191 | | |
189 | 192 | | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
190 | 196 | | |
191 | 197 | | |
192 | 198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| |||
229 | 231 | | |
230 | 232 | | |
231 | 233 | | |
232 | | - | |
| 234 | + | |
233 | 235 | | |
234 | 236 | | |
235 | 237 | | |
| |||
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
314 | 328 | | |
315 | | - | |
316 | | - | |
317 | | - | |
| 329 | + | |
318 | 330 | | |
319 | 331 | | |
320 | 332 | | |
| |||
513 | 525 | | |
514 | 526 | | |
515 | 527 | | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
516 | 531 | | |
517 | 532 | | |
518 | 533 | | |
519 | 534 | | |
520 | 535 | | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
521 | 575 | | |
522 | 576 | | |
523 | 577 | | |
| |||
0 commit comments