Ann-bench: fix unsafe lazy blobs#828
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-25.06 #828 +/- ##
=============================================
Coverage 84.00% 84.00%
=============================================
Files 18 18
Lines 125 125
=============================================
Hits 105 105
Misses 20 20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the update. It is a smart idea that we can put locks on the dataset layer ( |
tfeher
left a comment
There was a problem hiding this comment.
Hi Artem, thank you for this fix! Overall it is fine, but there is one issue.
|
I actually thought this is a little bit of a simplification :) two variables |
tfeher
left a comment
There was a problem hiding this comment.
I actually thought this is a little bit of a simplification
Ok, I accept your point above. Still I believe the helper function is doing more than what it should.
tfeher
left a comment
There was a problem hiding this comment.
Thanks Artem for the updates, LGTM!
|
/merge |
The ann-bench dataset uses lazy-loading blobs to move data between storage and host and device memory. The data may be moved between memory spaces at the moment some properties/pointers are requested. In the search throughput mode, this sometimes causes a problem: two concurrent benchmark threads access the same property and concurrently modify the state of the blobs, which leads to various segfaults. The fix is to guard the critical sections with a mutex lock. There shouldn't be any impact on benchmark QPS results. Only one method, `dataset->dim()` is accessed within the benchmark loop. To avoid locking the mutex in this method, this PR changes the way `dim()` is evaluated; it's cached in `dim_` variable while still maintaining the behavior of loading it either from the query set or the base set depending on what is available/accessed first. Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tarang Jain (https://github.com/tarang-jain) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#828
The ann-bench dataset uses lazy-loading blobs to move data between storage and host and device memory.
The data may be moved between memory spaces at the moment some properties/pointers are requested.
In the search throughput mode, this sometimes causes a problem: two concurrent benchmark threads access the same property and concurrently modify the state of the blobs, which leads to various segfaults.
The fix is to guard the critical sections with a mutex lock. There shouldn't be any impact on benchmark QPS results.
Only one method,
dataset->dim()is accessed within the benchmark loop. To avoid locking the mutex in this method, this PR changes the waydim()is evaluated; it's cached indim_variable while still maintaining the behavior of loading it either from the query set or the base set depending on what is available/accessed first.