Summary
Dear @mdouze and all,
ARM SVE is a newer extended vector instruction set than NEON and is supported on CPUs like AWS Graviton3 and Fujitsu A64fx.
I've added SVE support and some functions implemented with SVE to faiss, then compared their execution times.
It seems that my implementation improves the performance on some environment.
This is just first implementation to show the ability of SVE, and I plan to implemnent SVE version of other functions currently not ported to SVE.
It might be unable to check on Circle CI currently, however would you mind if I submit this as PR?
Platform
OS: Ubuntu 22.04
Faiss version: a3296f4, and mine
Installed from: compiled by myself
Faiss compilation options: cmake -B build -DFAISS_ENABLE_GPU=OFF -DPython_EXECUTABLE=$(which python3) -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=ON -DFAISS_OPT_LEVEL=sve ( -DFAISS_OPT_LEVEL=sve is new optlevel introduced by my changes)
Running on:
Interface:
Reproduction instructions
I only post the results to search SIFT1M. If you need more detailed information, please let me know.

- Evaluated on an AWS EC2 c7g.large instance, run faiss on
original is the current (a3296f4) implementation
SVE is the result of my implementation supporting ARM SVE

The above image illustrates the ratio of speed up.
- In the best case,
SVE is approx. 2.26x faster than original (IndexIVFPQ + IndexHNSWFlat, M: 32 nprove: 16)
original : 0.618 ms
SVE : 0.274 ms