-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19234][MLLib] AFTSurvivalRegression should fail fast when any labels are zero #16652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Leave the blank)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still causing a style check failure @admackin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being copied in from MLUtils? why is it necessary?
|
Yes, the version in MLUtils had labels of zero in the test cases, so was causing test cases to fail after my patch. It didn't look like there was a way to fix this, so I thought it better to make a patch that didn't affect potentially dozens of other packages. Any other thoughts on how to achieve this? I could add a 'minLabel' param to the MLUtils methods but that seems overly specific for this one package. |
|
I imagine there's a better way to do this without copying code. Do you mean the common test code assumes 0 labels are permitted? then maybe it should just not do that, because it's just using any old value to test. If it doesn't actually assume 0 labels are permitted, then its logic should still work. It's just that this test would need some additional logic to verify that 0 labels cause an exception. |
|
Yes that’s right, in MLUtils it supplies zero-labels as test cases (ie assumes they’re allowed, which for other regression algorithms would be true). More than happy to patch that instead though if you think that’s OK – it’ll just affect presumably a lot of other test cases.
Verifying the 0-labels cause exceptions isn’t yet covered, but probably should be (separately to MLUtils). (I’ll just need to find out how to assert thrown errors in the Spark testing)
…On 22 Jan 2017, 01:07 +1100, Sean Owen ***@***.***>, wrote:
I imagine there's a better way to do this without copying code. Do you mean the common test code assumes 0 labels are permitted? then maybe it should just not do that, because it's just using any old value to test.
If it doesn't actually assume 0 labels are permitted, then its logic should still work. It's just that this test would need some additional logic to verify that 0 labels cause an exception.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.((
|
|
It looks like there is no particular reason that |
|
Test build #3544 has finished for PR 16652 at commit
|
|
I've addressed all the problems I think – code style now fixed, MLTestingUtils patched (and verified all MLLib test cases still pass), and added a test case for zero-valued labels |
|
This is looking OK to me, but it needs a (squash, optionally, and) rebase now before it can test again. |
…ed in test cases as they now throw errors
|
Test build #3551 has finished for PR 16652 at commit
|
| (0.000, 0.0, Vectors.dense(0.346, 2.158)), // ← generates error; zero labels invalid | ||
| (4.199, 0.0, Vectors.dense(0.795, -0.226)))).toDF("label", "censor", "features") | ||
| val aft = new AFTSurvivalRegression() | ||
| intercept[SparkException] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's recommended to verify the error message using withClue, eg:
withClue("label of AFTPoint must be positive") {
intercept[SparkException] {
aft.fit(dataset)
}
}
|
looks good to me too. I just added a small suggestion. Thanks! |
|
Test build #3552 has finished for PR 16652 at commit
|
|
Test build #3554 has finished for PR 16652 at commit
|
|
The core change is looking OK @admackin but seems like it fails Python tests? if you have a moment to look at that, could be all that's needed to get this over the line. |
|
gentle ping @admackin |
## What changes were proposed in this pull request? This PR proposes to close PRs ... - inactive to the review comments more than a month - WIP and inactive more than a month - with Jenkins build failure but inactive more than a month - suggested to be closed and no comment against that - obviously looking inappropriate (e.g., Branch 0.5) To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below: Closes apache#11129 Closes apache#12085 Closes apache#12162 Closes apache#12419 Closes apache#12420 Closes apache#12491 Closes apache#13762 Closes apache#13837 Closes apache#13851 Closes apache#13881 Closes apache#13891 Closes apache#13959 Closes apache#14091 Closes apache#14481 Closes apache#14547 Closes apache#14557 Closes apache#14686 Closes apache#15594 Closes apache#15652 Closes apache#15850 Closes apache#15914 Closes apache#15918 Closes apache#16285 Closes apache#16389 Closes apache#16652 Closes apache#16743 Closes apache#16893 Closes apache#16975 Closes apache#17001 Closes apache#17088 Closes apache#17119 Closes apache#17272 Closes apache#17971 Added: Closes apache#17778 Closes apache#17303 Closes apache#17872 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18017 from HyukjinKwon/close-inactive-prs.
What changes were proposed in this pull request?
If any labels of 0.0 (which are invalid) are supplied, AFTSurvivalRegression gives an error straight away rather than hard-to-interpret warnings and zero-valued coefficients in the output.
How was this patch tested?
Verified against current test suite. (One test needed to be updated as it was providing values of zero for labels so was failing after this patch)
Please review http://spark.apache.org/contributing.html before opening a pull request.