Skip to content

Make HNSW merges cheaper on heap #14208

@benwtrent

Description

@benwtrent

Description

I am not sure of other structures, but HNSW merges can allocate a pretty large chunk of memory on heap.

For example:

Let's have the max_conn set to 16. Thus connections on the bottom layer is 32.

We eagerly create the neighbor arrays, which means for 9 million vectors, the heap allocation balloons to over 2GB (and depending on the number of layers and other structures, is over 2.5GB of heap).

From what I can tell, merges don't really expose a "Here is how much heap I am estimated to use".

I wonder if we can do one of the following to help this scenario:

  • Make HNSW merges cheaper when it comes to on-heap memory (e.g. merge off heap?!? make it cheaper??)
  • Don't eagerly allocate all the memory required (complicates multi-threaded merging...and might not actually address the issue)

Note, this is tangential this other HNSW merging issue, and might actually be an antithesis, as sometimes reducing memory allocations then implies slower merging: #12440

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions