[SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE#19413
[SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE#19413sahilTakiar wants to merge 1 commit into
Conversation
|
ok to test |
|
Test build #82397 has finished for PR 19413 at commit
|
|
Fixed the style check. |
|
Test build #82399 has finished for PR 19413 at commit
|
jiangxb1987
left a comment
There was a problem hiding this comment.
The fix looks good, only a minor issue. Since the repro involves GC, maybe there is no straight forward way to construct a test case here. Also cc @cloud-fan
There was a problem hiding this comment.
You are using two val with the same name here, normally we don't recommend that this way. How about follow vanzin's suggestion in the JIRA issue, like this:
Option(HadoopRDD.getCachedMetadata(jobConfCacheKey)).getOrElse(
HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
......
})
There was a problem hiding this comment.
I could do that, but then I would have to remove the logDebug("Re-using cached JobConf") statement. Unless there is some way to get around that in Scala.
Otherwise, I can remove the val name to cachedJobConf.
There was a problem hiding this comment.
You could do that without the val:
Option(HadoopRDD.getCachedMetadata(jobConfCacheKey))
.map { conf =>
logDebug(...)
conf.asInstanceOf[...]
}
.getOrElse {
// code in the "else" block
}
There was a problem hiding this comment.
Thanks for the tip. Updated the patch.
|
Test build #82429 has finished for PR 19413 at commit
|
|
LGTM. Merging to master and back to 2.0 unless I hit a conflict. |
## What changes were proposed in this pull request? Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call. ## How was this patch tested? Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again. Author: Sahil Takiar <[email protected]> Closes #19413 from sahilTakiar/master. (cherry picked from commit e36ec38) Signed-off-by: Marcelo Vanzin <[email protected]>
## What changes were proposed in this pull request? Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call. ## How was this patch tested? Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again. Author: Sahil Takiar <[email protected]> Closes #19413 from sahilTakiar/master. (cherry picked from commit e36ec38) Signed-off-by: Marcelo Vanzin <[email protected]>
|
(FYI, didn't merge to 2.0.) |
## What changes were proposed in this pull request? Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call. ## How was this patch tested? Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again. Author: Sahil Takiar <[email protected]> Closes apache#19413 from sahilTakiar/master. (cherry picked from commit e36ec38) Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
Fix for SPARK-20466, full description of the issue in the JIRA. To summarize,
HadoopRDDuses a metadata cache to cacheJobConfobjects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure.HadoopRDD#getJobConfhad a bug where it would check if the cache contained theJobConf, if it did it would get theJobConffrom the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call.How was this patch tested?
Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify
#getJobConfis doing the right thing. I deleted the methodHadoopRDD#containsCachedMetadataso that we don't hit this issue again.