-
Notifications
You must be signed in to change notification settings - Fork 18
Remove implementation that works with non-negative values only #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Creating an initially empty sketch, with low memory footprint | ||
| double relativeAccuracy = 0.01; | ||
| DDSketch sketch = DDSketch.memoryOptimal(relativeAccuracy); | ||
| DDSketch sketch = DDSketches.unboundedDense(relativeAccuracy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a constructor so a user can just do:
DDSketch sketch = DDSketch(relativeAccuracy)
and make the default be a collapsing lowest dense (with some high bin limit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid favoring any store for now in DDSketch, or maybe not until we have one implemented that unequivocally fits most if not all use cases more than any other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine if we favor one now and switch it later, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could still cause backward compatibility issues (e.g., if users rely on the behavior of the store or if they cast it to a specific implementation of the store), which is why I'd like to avoid it. I added a reference to DDSketches from the constructors in 477f8ea, as a way to further highlight the existence of those factory methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the initial README snippet to use the basic constructor instead?
| * millisecond and 1 minute, and about 6kB (802 bins) to cover values between 1 nanosecond and 1 day. The number of | ||
| * bins that are maintained can be upper-bounded using collapsing stores (see for example | ||
| * {@link #memoryOptimalCollapsingLowest} and {@link #memoryOptimalCollapsingHighest}). | ||
| * Note that negative values are inverted before being mapped to the store. That means that if you use a store that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the positive store is CollapsingLowest shouldn't we use CollapsingHighest for the negative store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say without knowing more concrete use cases, but I'd tend to think that if that were the case, we would be interested in highest quantiles and we probably wouldn't want a collapsing store for positive values in the first place (which would cause inaccurate quantiles in two non-contiguous areas). Or we would want a mechanism that starts collapsing the positive-value store only when the negative-value store is fully collapsed.
I believe the idea of collapsing close to zero fits more use cases, where we would allow loosening the relative-accuracy guarantee with an (adaptive) absolute-accuracy guarantee. Said otherwise, that would be a relatively accurate sketch on a best effort basis, which would keep the absolute error as low as possible when the relative accuracy cannot be enforced. For instance, if we are plotting those quantile values (e.g., as a time-series over time), I believe that's what we want, as opposed to collapsing on the right sides (or left sides) of both stores.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the python version we're collapsing the negative store first (using highest since it's reversed) and then collapsing the positive store. so the left collapse of the positive store doesn't happen until the negative store is empty
|
@CharlesMasson what's the motivation for this change? In the tracer team we want to record strictly positive latencies and I have a couple of proofs of concept for more efficient |
We want to match what we do in other libs, and make it clearer that the sketch can handle negative values (we've seen confusion about it). I don't think having a sketch that only handles non-negative values brings much benefit. The dense stores don't allocate memory for the count array if they stay empty, and we could even avoid constructing them in Regarding |
|
@CharlesMasson that's fair enough - I had modified the store test to allow an Regarding what do you get from disallowing negative values - nothing if you restrict yourself to arrays with offsets or a map, but 2's complement complicates the prefix compressed |
Also, move preset sketches to
DDSketchesand deprecate factory methods inDDSketch.