-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication #10741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
7c3d60f
1474b6f
ca9a852
37af391
49fd362
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -89,4 +89,22 @@ class StopWordsRemoverSuite | |
| .setCaseSensitive(true) | ||
| testDefaultReadWrite(t) | ||
| } | ||
|
|
||
| test("StopWordsRemover output column already exists") { | ||
| val outpuCol = "expected" | ||
| val remover = new StopWordsRemover() | ||
| .setInputCol("raw") | ||
| .setOutputCol(outpuCol) | ||
| .setCaseSensitive(true) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed for this test |
||
| val dataSet = sqlContext.createDataFrame(Seq( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just copy one of the datasets from an above test. That should fix the error.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I missed out that second column in dataSet was totally empty... - and that was the problem... |
||
| (Seq("A"), Seq("A")), | ||
| (Seq("The", "the"), Seq("The")) | ||
| )).toDF("raw", outpuCol) | ||
|
|
||
| val thrown = intercept[IllegalArgumentException] { | ||
| testStopWordsRemover(remover, dataSet) | ||
| } | ||
| assert(thrown.getClass === classOf[IllegalArgumentException]) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is already checked in line 104.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to take pattern from: I think this check theoretically could be useful: But of course, PatternSyntaxException is very unlikely to have message "requirement failed: Column ${outpuCol} already exists." Am I correct?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see; I bet you're correct. Don't worry about it for now, though. |
||
| assert(thrown.getMessage == s"requirement failed: Column ${outpuCol} already exists.") | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "outputCol"