Skip to content

Commit cfff397

Browse files
viiryajkbradley
authored andcommitted
[SPARK-6004][MLlib] Pick the best model when training GradientBoostedTrees with validation
Since the validation error does not change monotonically, in practice, it should be proper to pick the best model when training GradientBoostedTrees with validation instead of stopping it early. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#4763 from viirya/gbt_record_model and squashes the following commits: 452e049 [Liang-Chi Hsieh] Address comment. ea2fae2 [Liang-Chi Hsieh] Pick the best model when training GradientBoostedTrees with validation.
1 parent 2358657 commit cfff397

1 file changed

Lines changed: 9 additions & 3 deletions

File tree

mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,15 @@ object GradientBoostedTrees extends Logging {
251251

252252
logInfo("Internal timing for DecisionTree:")
253253
logInfo(s"$timer")
254-
255-
new GradientBoostedTreesModel(
256-
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
254+
if (validate) {
255+
new GradientBoostedTreesModel(
256+
boostingStrategy.treeStrategy.algo,
257+
baseLearners.slice(0, bestM),
258+
baseLearnerWeights.slice(0, bestM))
259+
} else {
260+
new GradientBoostedTreesModel(
261+
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
262+
}
257263
}
258264

259265
}

0 commit comments

Comments
 (0)