Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jun 12, 2017

What changes were proposed in this pull request?

Use Poisson analysis for approx count in all cases.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Jun 12, 2017

Test build #77931 has finished for PR 18276 at commit 8a14713.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jun 12, 2017

Test build #77938 has finished for PR 18276 at commit 8a14713.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77987 has finished for PR 18276 at commit f5311f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jun 14, 2017

Merged to master

@asfgit asfgit closed this in d6f76eb Jun 14, 2017
// p of the data. This suggests data is counted at a rate of sum / p across the whole data
// set. The total expected count from the rest is distributed as
// (1-p) Poisson(sum / p) = Poisson(sum*(1-p)/p)
val dist = new PoissonDistribution(sum * (1 - p) / p)
Copy link

@lovasoa lovasoa Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I know it is a little late for a review, but now that we have a single distribution, it would make the code clearer to estimate directly the total count with the poisson distribution. That is removing the 1-p here and the sum + in the final BoundedDouble.

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?

Use Poisson analysis for approx count in all cases.

## How was this patch tested?

Existing tests.

Author: Sean Owen <[email protected]>

Closes apache#18276 from srowen/SPARK-21057.
@srowen srowen deleted the SPARK-21057 branch June 16, 2017 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants