Skip to content

Supporting ARM SVE, the newer extended vector instruction set for aarch64 #2884

@vorj

Description

@vorj

Summary

Dear @mdouze and all,

ARM SVE is a newer extended vector instruction set than NEON and is supported on CPUs like AWS Graviton3 and Fujitsu A64fx.
I've added SVE support and some functions implemented with SVE to faiss, then compared their execution times.
It seems that my implementation improves the performance on some environment.
This is just first implementation to show the ability of SVE, and I plan to implemnent SVE version of other functions currently not ported to SVE.

It might be unable to check on Circle CI currently, however would you mind if I submit this as PR?

Platform

OS: Ubuntu 22.04

Faiss version: a3296f4, and mine

Installed from: compiled by myself

Faiss compilation options: cmake -B build -DFAISS_ENABLE_GPU=OFF -DPython_EXECUTABLE=$(which python3) -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=ON -DFAISS_OPT_LEVEL=sve ( -DFAISS_OPT_LEVEL=sve is new optlevel introduced by my changes)

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

I only post the results to search SIFT1M. If you need more detailed information, please let me know.

benchmark result

  • Evaluated on an AWS EC2 c7g.large instance, run faiss on
  • original is the current (a3296f4) implementation
  • SVE is the result of my implementation supporting ARM SVE

image

The above image illustrates the ratio of speed up.

  • In the best case, SVE is approx. 2.26x faster than original (IndexIVFPQ + IndexHNSWFlat, M: 32 nprove: 16)
    • original : 0.618 ms
    • SVE : 0.274 ms

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions