Add experimental FixedSizeRoundRobinReservoir#8305
Open
dashpole wants to merge 1 commit intoopen-telemetry:mainfrom
Open
Add experimental FixedSizeRoundRobinReservoir#8305dashpole wants to merge 1 commit intoopen-telemetry:mainfrom
dashpole wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8305 +/- ##
=====================================
Coverage 82.9% 83.0%
=====================================
Files 314 315 +1
Lines 24985 25060 +75
=====================================
+ Hits 20730 20804 +74
Misses 3882 3882
- Partials 373 374 +1
🚀 New features to boost your workflow:
|
Contributor
Author
|
Link failure is because the module isn't released yet. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This moves the implementation of the round-robin reservoir from #8257 to an experimental module.
This is an alternative approach to #7447, which had to be reverted in #8249 because of #8238. That approach attempted to make a highly-performant parallel version of algorithm-L, but ultimately is is very difficult to do so correctly.
Instead, this PR tries to improve parallel performance by sharding. Instead of one algorithm-L implementation for a k-sized reservoir, just have k algorithm-L copies for 1-sized reservoirs. This solves #8236 by moving the algorithm-L logic into the storage layer, so that each bucket is individually time-weighted. It also improves the concurrent performance of the fixed-size reservoir by reducing contention on the algorithm-L implementations, since they are distributed across multiple instances.
This should have identical runtime. One k-sized reservoir that handles
nObserve calls has a runtime ofO(k(1 + log(n/k)). A 1-sized reservoirs that handlesn/kObserve calls (we have an even distribution because of our round-robin) would haveO(1(1 + log(n/k))runtime. k 1-sized reservoirs would have aO(k(1 + log(n/k))runtime.This has the following downsides:
This uses a simple round-robin strategy. This does introduce a bias. Algorithm-L by itself makes any combination of exemplars equally likely (e.g. the first k and last k are equally likely). Using round-robin makes some exemplar combinations impossible (e.g. the 1st and k+1 th). This would matter if there was periodic behavior that aligned with the reservoir size. I.e. there is something "special" about every
kthOffer call. You would be highly likely to get only one "special" exemplar, and less likely to get zero or more than one special events. This tend to makes the results more "average". If this is a concern, I think this could be addressed with some added complexity, and some performance overhead.TODO: Benchmarks.
Gemini helped me write this.