-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28302] [CORE] Make sure to generate unique output file for SparkLauncher on Windows #25076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @jiangxb1987 @vanzin @srowen |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. That's the only usage of %RANDOM%. Is there any 'do-while' loop in Windows scripts? if not seems OK to me.
Still didn't find it after searching a lot. |
|
Test build #107346 has finished for PR 25076 at commit
|
|
cc @HyukjinKwon , too. |
|
let me retrigger the AppVeyor. Possibily related. |
|
Seems a different issue. I made a fix - #25081. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CMD %RANDOM%, I also checked that the seed is based on the clock time when the CMD session started. The current workaround looks okay to me, too.
bin/spark-class2.cmd
Outdated
| :gen | ||
| set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt | ||
| rem SPARK-28302: %RANDOM% would return the same number if we call it instantly after last call, | ||
| rem so we should make sure to generate unique file to avoid process collision of writing into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit.
make sure->make it sure.into a same file->into the same file.
|
Thank you @HyukjinKwon for your fix. |
|
Retest this please. |
|
Since R fix is merged, I retriggered this. |
|
Oh, there is a better way (close and reopen). :) |
|
Test build #107383 has finished for PR 25076 at commit
|
|
retest this please |
|
Test build #107385 has finished for PR 25076 at commit
|
|
Test build #107389 has finished for PR 25076 at commit
|
|
Merged to master, branch-2.4 and branch-2.3. |
…kLauncher on Windows ## What changes were proposed in this pull request? When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala). After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app. We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue. ## How was this patch tested? Tested manually on Windows. Closes #25076 from Ngone51/SPARK-28302. Authored-by: wuyi <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 925f620) Signed-off-by: HyukjinKwon <[email protected]>
…kLauncher on Windows ## What changes were proposed in this pull request? When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala). After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app. We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue. ## How was this patch tested? Tested manually on Windows. Closes #25076 from Ngone51/SPARK-28302. Authored-by: wuyi <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 925f620) Signed-off-by: HyukjinKwon <[email protected]>
|
Thanks! @srowen @dongjoon-hyun @HyukjinKwon |
…kLauncher on Windows ## What changes were proposed in this pull request? When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala). After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app. We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue. ## How was this patch tested? Tested manually on Windows. Closes apache#25076 from Ngone51/SPARK-28302. Authored-by: wuyi <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 925f620) Signed-off-by: HyukjinKwon <[email protected]>
…kLauncher on Windows ## What changes were proposed in this pull request? When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala). After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app. We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue. ## How was this patch tested? Tested manually on Windows. Closes apache#25076 from Ngone51/SPARK-28302. Authored-by: wuyi <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 925f620) Signed-off-by: HyukjinKwon <[email protected]>
|
The correct way is modify spark-class2.cmd. with: and the system works fine. |
What changes were proposed in this pull request?
When using SparkLauncher to submit applications concurrently with multiple threads under Windows, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this demo.
After digging into the code, I find that, Windows cmd
%RANDOM%would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.
How was this patch tested?
Tested manually on Windows.