Remove implementation that works with non-negative values only #28

CharlesMasson · 2020-11-13T10:43:36Z

Also, move preset sketches to DDSketches and deprecate factory methods in DDSketch.

githomin · 2020-11-13T13:27:09Z

README.md

 // Creating an initially empty sketch, with low memory footprint
 double relativeAccuracy = 0.01;
-DDSketch sketch = DDSketch.memoryOptimal(relativeAccuracy);
+DDSketch sketch = DDSketches.unboundedDense(relativeAccuracy);


can we create a constructor so a user can just do:
DDSketch sketch = DDSketch(relativeAccuracy)
and make the default be a collapsing lowest dense (with some high bin limit)?

I'd like to avoid favoring any store for now in DDSketch, or maybe not until we have one implemented that unequivocally fits most if not all use cases more than any other.

it's fine if we favor one now and switch it later, right?

It could still cause backward compatibility issues (e.g., if users rely on the behavior of the store or if they cast it to a specific implementation of the store), which is why I'd like to avoid it. I added a reference to DDSketches from the constructors in 477f8ea, as a way to further highlight the existence of those factory methods.

Can we update the initial README snippet to use the basic constructor instead?

githomin · 2020-11-13T13:34:39Z

src/main/java/com/datadoghq/sketch/ddsketch/DDSketch.java

- * millisecond and 1 minute, and about 6kB (802 bins) to cover values between 1 nanosecond and 1 day. The number of
- * bins that are maintained can be upper-bounded using collapsing stores (see for example
- * {@link #memoryOptimalCollapsingLowest} and {@link #memoryOptimalCollapsingHighest}).
+ * Note that negative values are inverted before being mapped to the store. That means that if you use a store that


If the positive store is CollapsingLowest shouldn't we use CollapsingHighest for the negative store?

It's hard to say without knowing more concrete use cases, but I'd tend to think that if that were the case, we would be interested in highest quantiles and we probably wouldn't want a collapsing store for positive values in the first place (which would cause inaccurate quantiles in two non-contiguous areas). Or we would want a mechanism that starts collapsing the positive-value store only when the negative-value store is fully collapsed.

I believe the idea of collapsing close to zero fits more use cases, where we would allow loosening the relative-accuracy guarantee with an (adaptive) absolute-accuracy guarantee. Said otherwise, that would be a relatively accurate sketch on a best effort basis, which would keep the absolute error as low as possible when the relative accuracy cannot be enforced. For instance, if we are plotting those quantile values (e.g., as a time-series over time), I believe that's what we want, as opposed to collapsing on the right sides (or left sides) of both stores.

What do you think?

in the python version we're collapsing the negative store first (using highest since it's reversed) and then collapsing the positive store. so the left collapse of the positive store doesn't happen until the negative store is empty

richardstartin · 2020-11-16T10:45:05Z

@CharlesMasson what's the motivation for this change? In the tracer team we want to record strictly positive latencies and I have a couple of proofs of concept for more efficient ~~IndeMappings~~ Stores which only work with positive values. It would be awkward to make these work with negative values.

CharlesMasson · 2020-11-16T11:16:26Z

@CharlesMasson what's the motivation for this change? In the tracer team we want to record strictly positive latencies and I have a couple of proofs of concept for more efficient IndexMappings which only work with positive values. It would be awkward to make these work with negative values.

We want to match what we do in other libs, and make it clearer that the sketch can handle negative values (we've seen confusion about it).

I don't think having a sketch that only handles non-negative values brings much benefit. The dense stores don't allocate memory for the count array if they stay empty, and we could even avoid constructing them in DDSketch if they don't receive any values. If we explicitly want to reject negative values or any range of values, we can still easily do it upstream.

Regarding IndexMapping, this PR doesn't introduce any changes, any of its implementations is still expected to work on positive values only.

richardstartin · 2020-11-16T11:34:12Z

@CharlesMasson that's fair enough - I had modified the store test to allow an Store to opt out of testing for negative values. I won't pursue this strategy, since it sounds like I would be cutting against the grain, and will make at least one of the Stores take negative values.

Regarding what do you get from disallowing negative values - nothing if you restrict yourself to arrays with offsets or a map, but 2's complement complicates the prefix compressed Store I was planning to propose a lot, because it breaks ordering of mixed sign sets.

CharlesMasson added 3 commits November 12, 2020 18:42

Remove implementation that works with non-negative values only

18dbafd

Rename SignedDDSketch to DDSketch

b94ec93

Move preset sketches to DDSketches

041294d

CharlesMasson requested a review from githomin November 13, 2020 10:43

githomin reviewed Nov 13, 2020

View reviewed changes

Add reference to DDSketches in the constructors of DDSketch

477f8ea

richardstartin mentioned this pull request Nov 17, 2020

Paginated store #30

Merged

Add default constructor for testing purposes

371388b

githomin approved these changes Nov 20, 2020

View reviewed changes

CharlesMasson merged commit 6976126 into master Nov 20, 2020

CharlesMasson deleted the cmasson/one_ddsketch branch November 20, 2020 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove implementation that works with non-negative values only #28

Remove implementation that works with non-negative values only #28

Uh oh!

CharlesMasson commented Nov 13, 2020

Uh oh!

githomin Nov 13, 2020

Uh oh!

CharlesMasson Nov 13, 2020

Uh oh!

githomin Nov 13, 2020

Uh oh!

CharlesMasson Nov 16, 2020

Uh oh!

githomin Nov 20, 2020

Uh oh!

githomin Nov 13, 2020

Uh oh!

CharlesMasson Nov 13, 2020

Uh oh!

githomin Nov 13, 2020

Uh oh!

richardstartin commented Nov 16, 2020 •

edited

Loading

Uh oh!

CharlesMasson commented Nov 16, 2020

Uh oh!

richardstartin commented Nov 16, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove implementation that works with non-negative values only #28

Remove implementation that works with non-negative values only #28

Uh oh!

Conversation

CharlesMasson commented Nov 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardstartin commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CharlesMasson commented Nov 16, 2020

Uh oh!

richardstartin commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richardstartin commented Nov 16, 2020 •

edited

Loading

richardstartin commented Nov 16, 2020 •

edited

Loading