Skip to content
This repository was archived by the owner on Nov 15, 2024. It is now read-only.

Commit ca3e892

Browse files
yanboliangMatthewRBruce
authored andcommitted
[SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.
## What changes were proposed in this pull request? apache#19197 fixed double caching for MLlib algorithms, but missed PySpark ```OneVsRest```, this PR fixed it. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#19220 from yanboliang/SPARK-18608. (cherry picked from commit c76153c) Signed-off-by: Yanbo Liang <ybliang8@gmail.com>
1 parent 296223a commit ca3e892

1 file changed

Lines changed: 2 additions & 4 deletions

File tree

python/pyspark/ml/classification.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1576,8 +1576,7 @@ def _fit(self, dataset):
15761576
multiclassLabeled = dataset.select(labelCol, featuresCol)
15771577

15781578
# persist if underlying dataset is not persistent.
1579-
handlePersistence = \
1580-
dataset.rdd.getStorageLevel() == StorageLevel(False, False, False, False)
1579+
handlePersistence = dataset.storageLevel == StorageLevel(False, False, False, False)
15811580
if handlePersistence:
15821581
multiclassLabeled.persist(StorageLevel.MEMORY_AND_DISK)
15831582

@@ -1690,8 +1689,7 @@ def _transform(self, dataset):
16901689
newDataset = dataset.withColumn(accColName, initUDF(dataset[origCols[0]]))
16911690

16921691
# persist if underlying dataset is not persistent.
1693-
handlePersistence = \
1694-
dataset.rdd.getStorageLevel() == StorageLevel(False, False, False, False)
1692+
handlePersistence = dataset.storageLevel == StorageLevel(False, False, False, False)
16951693
if handlePersistence:
16961694
newDataset.persist(StorageLevel.MEMORY_AND_DISK)
16971695

0 commit comments

Comments
 (0)