Fix race and improve performance of HNSW graph building#15590
Fix race and improve performance of HNSW graph building#15590benwtrent merged 10 commits intoapache:mainfrom
Conversation
Previously, a volatile field `graphRamBytesUsed` was used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field. The field was replaced with a `LongAdderr`. See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85
|
The code fixed in this PR was introduced in 10.2.2 in #14527. We internally have been just recently upgrading from 10.2.1 to 10.2.2, and we noticed around 15% performance degradation of vector indexing. In our test we've been using 8 threads. Benchmarks were done as a part of the PR, but glancing through the discussion, it seems they were done before this particular code was introduced. With multiple threads, writes to After applying this PR to our internal 10.2.2 fork, we see the original performance. Because of the simplicity of the fix, I suggest backporting it to 10.3, and perhaps even to 10.2. |
benwtrent
left a comment
There was a problem hiding this comment.
I think this is good. Also, I think backporting it to 10.3, 2 is fine. At a minimum, I do think it should go into 10.4
lucene/CHANGES.txt
Outdated
| * GITHUB#12561: UAX29URLEmailTokenizer matched emails with commas and invalid | ||
| periods in the local part. (Eran Yarkon) | ||
|
|
||
| * GITHUB#15590: Fix race and improve performance of HNSW graph building (Viliam Durina) |
There was a problem hiding this comment.
move this all to 10.4 at least I would say so it can be backported
|
@viliam-durina you good for this being marged and backported? |
Sure, I can do the backport(s) too, but you tell me whether to 10.4 or 10.3 or even 10.2. |
|
@viliam-durina I am fine with as far back as you want to go. But 10.4 for sure. I don't know if/when there will be other bugfix releases. |
|
TL;DR: I'm no longer claiming this PR gives 15% improvement to parallel HNSW graph building. It's only a race fix and perhaps a small performance improvement. So it's sufficient to backport to 10.4. I saw 15% slowdown in one particular dataset when upgrading from 10.2.1 to 10.2.2. It was 15% on AWS instance, but only 5% on my laptop (with performance and efficient cores and thermal throttling). However I'm using a Lucene fork and there are other non-trivial changes involved. I was investigating my changes and couldn't find the cause, and I found the issue addressed in this PR, and on my laptop I saw a few % improvement, so I claimed 15% improvement. However, when I was later trying to reproduce the improvement on an AWS instance, I couldn't observe measurable improvement, as each benchmark run is a few % different. |
|
I am starting the 10.4 release process, since this hasn't been merged, I am assuming it will not be part of 10.4. |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
* Fix race in OnHeapHnswGraph memory accounting Previously, a volatile field `graphRamBytesUsed` was used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field. The field was replaced with a `LongAdderr`. See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85 * Update CHANGES * Update CHANGES.txt * Update CHANGES.txt * Apply suggestion from @benwtrent * formatting --------- Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
Previously, a volatile field
graphRamBytesUsedwas used. However, a non-atomic read-modify-write operation was used, which could lead to lost update in case multiple threads are updating the field.The field was replaced with a
LongAdderr.See also the discussion in https://lists.apache.org/thread/xj8j0hx7nggo25471mybky1h9m4rrm85