Skip to content

Conversation

@CharlesMasson
Copy link
Contributor

Also, move preset sketches to DDSketches and deprecate factory methods in DDSketch.

// Creating an initially empty sketch, with low memory footprint
double relativeAccuracy = 0.01;
DDSketch sketch = DDSketch.memoryOptimal(relativeAccuracy);
DDSketch sketch = DDSketches.unboundedDense(relativeAccuracy);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a constructor so a user can just do:
DDSketch sketch = DDSketch(relativeAccuracy)
and make the default be a collapsing lowest dense (with some high bin limit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid favoring any store for now in DDSketch, or maybe not until we have one implemented that unequivocally fits most if not all use cases more than any other.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine if we favor one now and switch it later, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could still cause backward compatibility issues (e.g., if users rely on the behavior of the store or if they cast it to a specific implementation of the store), which is why I'd like to avoid it. I added a reference to DDSketches from the constructors in 477f8ea, as a way to further highlight the existence of those factory methods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the initial README snippet to use the basic constructor instead?

* millisecond and 1 minute, and about 6kB (802 bins) to cover values between 1 nanosecond and 1 day. The number of
* bins that are maintained can be upper-bounded using collapsing stores (see for example
* {@link #memoryOptimalCollapsingLowest} and {@link #memoryOptimalCollapsingHighest}).
* Note that negative values are inverted before being mapped to the store. That means that if you use a store that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the positive store is CollapsingLowest shouldn't we use CollapsingHighest for the negative store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to say without knowing more concrete use cases, but I'd tend to think that if that were the case, we would be interested in highest quantiles and we probably wouldn't want a collapsing store for positive values in the first place (which would cause inaccurate quantiles in two non-contiguous areas). Or we would want a mechanism that starts collapsing the positive-value store only when the negative-value store is fully collapsed.

I believe the idea of collapsing close to zero fits more use cases, where we would allow loosening the relative-accuracy guarantee with an (adaptive) absolute-accuracy guarantee. Said otherwise, that would be a relatively accurate sketch on a best effort basis, which would keep the absolute error as low as possible when the relative accuracy cannot be enforced. For instance, if we are plotting those quantile values (e.g., as a time-series over time), I believe that's what we want, as opposed to collapsing on the right sides (or left sides) of both stores.

What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the python version we're collapsing the negative store first (using highest since it's reversed) and then collapsing the positive store. so the left collapse of the positive store doesn't happen until the negative store is empty

@richardstartin
Copy link
Contributor

richardstartin commented Nov 16, 2020

@CharlesMasson what's the motivation for this change? In the tracer team we want to record strictly positive latencies and I have a couple of proofs of concept for more efficient IndeMappings Stores which only work with positive values. It would be awkward to make these work with negative values.

@CharlesMasson
Copy link
Contributor Author

@CharlesMasson what's the motivation for this change? In the tracer team we want to record strictly positive latencies and I have a couple of proofs of concept for more efficient IndexMappings which only work with positive values. It would be awkward to make these work with negative values.

We want to match what we do in other libs, and make it clearer that the sketch can handle negative values (we've seen confusion about it).

I don't think having a sketch that only handles non-negative values brings much benefit. The dense stores don't allocate memory for the count array if they stay empty, and we could even avoid constructing them in DDSketch if they don't receive any values. If we explicitly want to reject negative values or any range of values, we can still easily do it upstream.

Regarding IndexMapping, this PR doesn't introduce any changes, any of its implementations is still expected to work on positive values only.

@richardstartin
Copy link
Contributor

richardstartin commented Nov 16, 2020

@CharlesMasson that's fair enough - I had modified the store test to allow an Store to opt out of testing for negative values. I won't pursue this strategy, since it sounds like I would be cutting against the grain, and will make at least one of the Stores take negative values.

Regarding what do you get from disallowing negative values - nothing if you restrict yourself to arrays with offsets or a map, but 2's complement complicates the prefix compressed Store I was planning to propose a lot, because it breaks ordering of mixed sign sets.

@richardstartin richardstartin mentioned this pull request Nov 17, 2020
@CharlesMasson CharlesMasson merged commit 6976126 into master Nov 20, 2020
@CharlesMasson CharlesMasson deleted the cmasson/one_ddsketch branch November 20, 2020 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants