Commit d128b6a
[SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm failing in edge case
[SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm failing in edge case where no children exist in updateAssignments
## What changes were proposed in this pull request?
Fix a bug in which BisectingKMeans fails with error:
java.util.NoSuchElementException: key not found: 166
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
at scala.collection.immutable.List.reduceLeft(List.scala:84)
at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231)
at scala.collection.AbstractTraversable.minBy(Traversable.scala:105)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
## How was this patch tested?
The dataset was run against the code change to verify that the code works. I will try to add unit tests to the code.
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Ilya Matiach <ilmat@microsoft.com>
Closes #16355 from imatiach-msft/ilmat/fix-kmeans.1 parent 9c04e42 commit d128b6a
3 files changed
Lines changed: 44 additions & 7 deletions
File tree
- mllib/src
- main/scala/org/apache/spark/mllib/clustering
- test/scala/org/apache/spark/ml/clustering
Lines changed: 12 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
342 | | - | |
343 | | - | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
344 | 350 | | |
345 | | - | |
346 | 351 | | |
347 | 352 | | |
348 | 353 | | |
| |||
372 | 377 | | |
373 | 378 | | |
374 | 379 | | |
375 | | - | |
| 380 | + | |
| 381 | + | |
376 | 382 | | |
377 | 383 | | |
378 | | - | |
379 | | - | |
380 | | - | |
| 384 | + | |
| 385 | + | |
381 | 386 | | |
382 | 387 | | |
383 | 388 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
54 | 73 | | |
55 | 74 | | |
56 | 75 | | |
| |||
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
163 | 176 | | |
164 | 177 | | |
165 | 178 | | |
| |||
0 commit comments