Skip to content

How to control the total number of merging threads when vector data merging easily leads to memory overflow and high CPU cost #14554

@weizijun

Description

@weizijun

Description

When there are many shards to merge, vector data merging can easily lead to memory overflow and high CPU cost.
The index.merge.scheduler.max_thread_count parameter can't control the merge thread count, it only pause the writeByte by MergeRateLimiter when the merge thread is bigger then max_thread_count.
But OnHeapHnswGraph has been built during the pause phase, and it will take up so much memory that the Java heap is not enough.
This problem can easily be caused when a datanode with a 32G heap size holds 2-3TB of vector documents(with bbq, the node can contain these data).
The PR #14527 can reduce the heap size, but it don't solve the problem totally.
Is there any solution to this problem?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions