-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22479][SQL] Exclude credentials from SaveintoDataSourceCommand.simpleString #19708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, this is ok to test |
|
Test build #83652 has finished for PR 19708 at commit
|
| Seq.empty[Row] | ||
| } | ||
|
|
||
| override def simpleString: String = s"SaveIntoDataSourceCommand ${dataSource}, ${mode}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use that, but I would need to expand the default for spark.redaction.regex as the user and url (some drivers allow credentials passed in the connection url) fields might also contain sensitive data. We should also change JDBCRelation::toString to include the redaction regex to be consistent.
I would argue not showing the jdbc properties at all because they provide little value and wrong redaction regex configuration could cause leaks to downstream log collection systems.
let me know if that makes sense, and I can modify this accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveIntoDataSourceCommand is not being used for JDBC only.
JDBCRelation::toString was already fixed in #15975 ?
|
Test build #83799 has finished for PR 19708 at commit
|
|
Test build #83810 has finished for PR 19708 at commit
|
ae091ec to
56f48f3
Compare
|
Just want to confirm whether the examples in the PR description are the ones based on the latest updates? |
|
Generally, this looks good to me. |
|
This looks good. I was wondering if we shouldn't also take a look at data source operations like Could you also add a test? |
|
Test build #83811 has finished for PR 19708 at commit
|
|
Yeah PR description reflects the latest changes. |
|
Test build #83864 has finished for PR 19708 at commit
|
| override protected def sparkConf: SparkConf = super.sparkConf | ||
| .set("spark.redaction.string.regex", "(?i)password|url") | ||
|
|
||
| test("treeString is redacted") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old test name? we're not modifying the treeString anymore, it's just the SaveIntoDataSourceCommand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the naming convention here:
spark/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
Line 33 in e9f983d
| test("treeString is redacted") { |
we are essentially redacting SaveIntoDataSourceCommand::simpleString which is called in SaceIntoDataSourceCommand::treeString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not really a convention. Can you just call it simpleString is redacted?
| class SaveIntoDataSourceCommandSuite extends SharedSQLContext { | ||
|
|
||
| override protected def sparkConf: SparkConf = super.sparkConf | ||
| .set("spark.redaction.string.regex", "(?i)password|url") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoundn't this be spark.redaction.regex instead of spark.redaction.string.regex?
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending jenkins
| class SaveIntoDataSourceCommandSuite extends SharedSQLContext { | ||
|
|
||
| override protected def sparkConf: SparkConf = super.sparkConf | ||
| .set("spark.redaction.regex", "(?i)password|url") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indents.
|
Test build #83895 has finished for PR 19708 at commit
|
|
LGTM |
|
Test build #83896 has finished for PR 19708 at commit
|
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #83901 has finished for PR 19708 at commit
|
|
Thanks! Merged to master. @onursatici Could you submit a separate PR for 2.2? |
What changes were proposed in this pull request?
Do not include jdbc properties which may contain credentials in logging a logical plan with
SaveIntoDataSourceCommandin it.How was this patch tested?
building locally and trying to reproduce (per the steps in https://issues.apache.org/jira/browse/SPARK-22479):