Skip to content

Fix race and improve performance of HNSW graph building#15590

Merged
benwtrent merged 10 commits intoapache:mainfrom
viliam-durina:ram-bytes-used-race-fix
Feb 26, 2026
Merged

Fix race and improve performance of HNSW graph building#15590
benwtrent merged 10 commits intoapache:mainfrom
viliam-durina:ram-bytes-used-race-fix

Conversation

@viliam-durina
Copy link
Contributor

Previously, a volatile field graphRamBytesUsed was used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field.

The field was replaced with a LongAdderr.

See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85

Previously, a volatile field `graphRamBytesUsed` was used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field.

The field was replaced with a `LongAdderr`.

See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85
@github-actions github-actions bot added this to the 11.0.0 milestone Jan 20, 2026
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viliam-durina
Copy link
Contributor Author

viliam-durina commented Jan 21, 2026

The code fixed in this PR was introduced in 10.2.2 in #14527. We internally have been just recently upgrading from 10.2.1 to 10.2.2, and we noticed around 15% performance degradation of vector indexing. In our test we've been using 8 threads.

Benchmarks were done as a part of the PR, but glancing through the discussion, it seems they were done before this particular code was introduced. With multiple threads, writes to graphRamBytesUsed are very frequent and thus very contended. This would be true even if AtomicLong was used instead of LongAdder. The degradation would likely be much worse than 15% with more cores.

After applying this PR to our internal 10.2.2 fork, we see the original performance. Because of the simplicity of the fix, I suggest backporting it to 10.3, and perhaps even to 10.2.

@viliam-durina viliam-durina changed the title Fix race in OnHeapHnswGraph memory accounting Fix race and improve performance of HNSW graph building Jan 21, 2026
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. Also, I think backporting it to 10.3, 2 is fine. At a minimum, I do think it should go into 10.4

* GITHUB#12561: UAX29URLEmailTokenizer matched emails with commas and invalid
periods in the local part. (Eran Yarkon)

* GITHUB#15590: Fix race and improve performance of HNSW graph building (Viliam Durina)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this all to 10.4 at least I would say so it can be backported

@github-actions github-actions bot modified the milestones: 11.0.0, 10.4.0 Jan 21, 2026
@benwtrent
Copy link
Member

@viliam-durina you good for this being marged and backported?

@viliam-durina
Copy link
Contributor Author

@viliam-durina you good for this being marged and backported?

Sure, I can do the backport(s) too, but you tell me whether to 10.4 or 10.3 or even 10.2.

@benwtrent
Copy link
Member

@viliam-durina I am fine with as far back as you want to go. But 10.4 for sure. I don't know if/when there will be other bugfix releases.

@viliam-durina
Copy link
Contributor Author

TL;DR: I'm no longer claiming this PR gives 15% improvement to parallel HNSW graph building. It's only a race fix and perhaps a small performance improvement. So it's sufficient to backport to 10.4.

I saw 15% slowdown in one particular dataset when upgrading from 10.2.1 to 10.2.2. It was 15% on AWS instance, but only 5% on my laptop (with performance and efficient cores and thermal throttling). However I'm using a Lucene fork and there are other non-trivial changes involved. I was investigating my changes and couldn't find the cause, and I found the issue addressed in this PR, and on my laptop I saw a few % improvement, so I claimed 15% improvement. However, when I was later trying to reproduce the improvement on an AWS instance, I couldn't observe measurable improvement, as each benchmark run is a few % different.

@benwtrent
Copy link
Member

I am starting the 10.4 release process, since this hasn't been merged, I am assuming it will not be part of 10.4.

@benwtrent benwtrent modified the milestones: 10.4.0, 10.5.0 Feb 6, 2026
@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 25, 2026
@github-actions github-actions bot removed the Stale label Feb 26, 2026
@benwtrent benwtrent merged commit 3b8a0cd into apache:main Feb 26, 2026
13 checks passed
benwtrent added a commit that referenced this pull request Feb 26, 2026
* Fix race in OnHeapHnswGraph memory accounting

Previously, a volatile field `graphRamBytesUsed` was used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field.

The field was replaced with a `LongAdderr`.

See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85

* Update CHANGES

* Update CHANGES.txt

* Update CHANGES.txt

* Apply suggestion from @benwtrent

* formatting

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
@viliam-durina viliam-durina deleted the ram-bytes-used-race-fix branch February 27, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants