-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20087] Attach accumulators / metrics to 'TaskKilled' end reason #17422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @noodle-fb, it seems not a trivial change that does not need a JIRA. Could we create a JIRA and put this in the title (see http://spark.apache.org/contributing.html)? |
|
/cc @ericl as FYI. |
|
@HyukjinKwon edited with Jira tag, didn't realize that was the naming convention |
|
@JoshRosen ping? not sure how to github correctly |
|
add to whitelist |
|
ok to test |
|
@noodle-fb could you rebase this so we can review it? Thanks! |
|
@noodle-fb are you still working on this? If not, I may work on it based on your current impl. I am facing same issue here. The accumulator updates are lost for killed tasks. |
|
@advancedxy this has been quiet for a long time, so I suggest you just take it over. I actually think this is so close to complete that very little would need to be done, and credit would most likely go to @noodle-fb . That said, we may need a little more input on whether or not this is desirable, as it will change the meaning of the aggregated metrics. |
| override def toErrorString: String = "TaskKilled ($reason)" | ||
| override def countTowardsTaskFailures: Boolean = false | ||
|
|
||
| private[spark] def withAccums(accums: Seq[AccumulatorV2[_, _]]): TaskKilled = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this method is really necessary at all, you could just pass it in the constructor in the places its used.
| } else { | ||
| Seq.empty | ||
| } | ||
| val accUpdates = accums.map(acc => acc.toInfo(Some(acc.value), None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be refactored, and not repeated 3 times.
|
@advancedxy, feel free to take this over! @squito, as I remember this, it seemed inconsistent to count metrics for tasks that fail, but not tasks that were killed, machines are doing work in either case. But others might interpret the metrics differently. |
|
All right then, I will take it over. Of course the credit should go to @noodle-fb. We can discuss whether this behaviour is desirable or not in the JIRA or the new PR. |
… reason ## What changes were proposed in this pull request? The ultimate goal is for listeners to onTaskEnd to receive metrics when a task is killed intentionally, since the data is currently just thrown away. This is already done for ExceptionFailure, so this just copies the same approach. ## How was this patch tested? Updated existing tests. This is a rework of apache#17422, all credits should go to noodle-fb Author: Xianjin YE <[email protected]> Author: Charles Lewis <[email protected]> Closes apache#21165 from advancedxy/SPARK-20087.
|
Can one of the admins verify this patch? |
Closes apache#17422 Closes apache#17619 Closes apache#18034 Closes apache#18229 Closes apache#18268 Closes apache#17973 Closes apache#18125 Closes apache#18918 Closes apache#19274 Closes apache#19456 Closes apache#19510 Closes apache#19420 Closes apache#20090 Closes apache#20177 Closes apache#20304 Closes apache#20319 Closes apache#20543 Closes apache#20437 Closes apache#21261 Closes apache#21726 Closes apache#14653 Closes apache#13143 Closes apache#17894 Closes apache#19758 Closes apache#12951 Closes apache#17092 Closes apache#21240 Closes apache#16910 Closes apache#12904 Closes apache#21731 Closes apache#21095 Added: Closes apache#19233 Closes apache#20100 Closes apache#21453 Closes apache#21455 Closes apache#18477 Added: Closes apache#21812 Closes apache#21787 Author: hyukjinkwon <[email protected]> Closes apache#21781 from HyukjinKwon/closing-prs.
What changes were proposed in this pull request?
The ultimate goal is for listeners to
onTaskEndto receive metrics when a task is killed intentionally, since the data is currently just thrown away. This is already done for ExceptionFailure, so this just copies the same approach.How was this patch tested?
The unit test in DAGSchedulerSuite that tests this for ExceptionFailure was modified to test the same thing for TaskKilled. I also re-tested all the unit tests modified by the last change to TaskKilled, and made sure they all still pass.
For integration tests, I ran a query that caused a speculative task retry on our deployment, and verified that the metrics showed up in our logging for that retry when it was killed.