This repository was archived by the owner on Nov 15, 2024. It is now read-only.
Commit 0456b40
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance generate negative result
## What changes were proposed in this pull request?
Because of numerical error, MultivariateOnlineSummarizer.variance is possible to generate negative variance.
**This is a serious bug because many algos in MLLib**
**use stddev computed from** `sqrt(variance)`
**it will generate NaN and crash the whole algorithm.**
we can reproduce this bug use the following code:
```
val summarizer1 = (new MultivariateOnlineSummarizer)
.add(Vectors.dense(3.0), 0.7)
val summarizer2 = (new MultivariateOnlineSummarizer)
.add(Vectors.dense(3.0), 0.4)
val summarizer3 = (new MultivariateOnlineSummarizer)
.add(Vectors.dense(3.0), 0.5)
val summarizer4 = (new MultivariateOnlineSummarizer)
.add(Vectors.dense(3.0), 0.4)
val summarizer = summarizer1
.merge(summarizer2)
.merge(summarizer3)
.merge(summarizer4)
println(summarizer.variance(0))
```
This PR fix the bugs in `mllib.stat.MultivariateOnlineSummarizer.variance` and `ml.stat.SummarizerBuffer.variance`, and several places in `WeightedLeastSquares`
## How was this patch tested?
test cases added.
Author: WeichenXu <WeichenXu123@outlook.com>
Closes apache#19029 from WeichenXu123/fix_summarizer_var_bug.1 parent 07142cf commit 0456b40
5 files changed
Lines changed: 51 additions & 7 deletions
File tree
- mllib/src
- main/scala/org/apache/spark
- mllib/stat
- ml
- optim
- stat
- test/scala/org/apache/spark
- mllib/stat
- ml/stat
Lines changed: 9 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
440 | 440 | | |
441 | 441 | | |
442 | 442 | | |
443 | | - | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
444 | 448 | | |
445 | 449 | | |
446 | 450 | | |
| |||
471 | 475 | | |
472 | 476 | | |
473 | 477 | | |
474 | | - | |
| 478 | + | |
| 479 | + | |
475 | 480 | | |
476 | 481 | | |
477 | 482 | | |
| |||
489 | 494 | | |
490 | 495 | | |
491 | 496 | | |
492 | | - | |
| 497 | + | |
| 498 | + | |
493 | 499 | | |
494 | 500 | | |
495 | 501 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
436 | 436 | | |
437 | 437 | | |
438 | 438 | | |
439 | | - | |
440 | | - | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
441 | 442 | | |
442 | 443 | | |
443 | 444 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | | - | |
217 | | - | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
218 | 219 | | |
219 | 220 | | |
220 | 221 | | |
| |||
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
402 | 402 | | |
403 | 403 | | |
404 | 404 | | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
405 | 423 | | |
406 | 424 | | |
407 | 425 | | |
| |||
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
273 | 291 | | |
0 commit comments