Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jul 8, 2019

What changes were proposed in this pull request?

When using SparkLauncher to submit applications concurrently with multiple threads under Windows, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by this demo.

After digging into the code, I find that, Windows cmd %RANDOM% would return the same number if we call it instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.

We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.

How was this patch tested?

Tested manually on Windows.

@Ngone51
Copy link
Member Author

Ngone51 commented Jul 8, 2019

ping @jiangxb1987 @vanzin @srowen

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. That's the only usage of %RANDOM%. Is there any 'do-while' loop in Windows scripts? if not seems OK to me.

@Ngone51
Copy link
Member Author

Ngone51 commented Jul 8, 2019

Is there any 'do-while' loop in Windows scripts?

Still didn't find it after searching a lot.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107346 has finished for PR 25076 at commit b55dfc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 8, 2019

cc @HyukjinKwon , too.

@HyukjinKwon HyukjinKwon closed this Jul 9, 2019
@HyukjinKwon HyukjinKwon reopened this Jul 9, 2019
@HyukjinKwon
Copy link
Member

let me retrigger the AppVeyor. Possibily related.

@HyukjinKwon
Copy link
Member

Seems a different issue. I made a fix - #25081.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CMD %RANDOM%, I also checked that the seed is based on the clock time when the CMD session started. The current workaround looks okay to me, too.

:gen
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
rem SPARK-28302: %RANDOM% would return the same number if we call it instantly after last call,
rem so we should make sure to generate unique file to avoid process collision of writing into a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.

  • make sure -> make it sure.
  • into a same file -> into the same file.

@Ngone51
Copy link
Member Author

Ngone51 commented Jul 9, 2019

Thank you @HyukjinKwon for your fix.

@dongjoon-hyun
Copy link
Member

Retest this please.

@HyukjinKwon HyukjinKwon closed this Jul 9, 2019
@HyukjinKwon HyukjinKwon reopened this Jul 9, 2019
@dongjoon-hyun
Copy link
Member

Since R fix is merged, I retriggered this.

@dongjoon-hyun
Copy link
Member

Oh, there is a better way (close and reopen). :)

@SparkQA
Copy link

SparkQA commented Jul 9, 2019

Test build #107383 has finished for PR 25076 at commit 9273f33.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2019

Test build #107385 has finished for PR 25076 at commit 9273f33.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 9, 2019

Test build #107389 has finished for PR 25076 at commit 9273f33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 9, 2019

Merged to master, branch-2.4 and branch-2.3.

HyukjinKwon pushed a commit that referenced this pull request Jul 9, 2019
…kLauncher on Windows

## What changes were proposed in this pull request?

When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by  this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala).

After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it  instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.

We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.

## How was this patch tested?

Tested manually on Windows.

Closes #25076 from Ngone51/SPARK-28302.

Authored-by: wuyi <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 925f620)
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Jul 9, 2019
…kLauncher on Windows

## What changes were proposed in this pull request?

When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by  this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala).

After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it  instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.

We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.

## How was this patch tested?

Tested manually on Windows.

Closes #25076 from Ngone51/SPARK-28302.

Authored-by: wuyi <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 925f620)
Signed-off-by: HyukjinKwon <[email protected]>
@Ngone51
Copy link
Member Author

Ngone51 commented Jul 9, 2019

Thanks! @srowen @dongjoon-hyun @HyukjinKwon

rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…kLauncher on Windows

## What changes were proposed in this pull request?

When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by  this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala).

After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it  instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.

We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.

## How was this patch tested?

Tested manually on Windows.

Closes apache#25076 from Ngone51/SPARK-28302.

Authored-by: wuyi <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 925f620)
Signed-off-by: HyukjinKwon <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…kLauncher on Windows

## What changes were proposed in this pull request?

When using SparkLauncher to submit applications **concurrently** with multiple threads under **Windows**, some apps would show that "The process cannot access the file because it is being used by another process" and remains in LOST state at the end. The issue can be reproduced by  this [demo](https://issues.apache.org/jira/secure/attachment/12973920/Main.scala).

After digging into the code, I find that, Windows cmd `%RANDOM%` would return the same number if we call it  instantly(e.g. < 500ms) after last call. As a result, SparkLauncher would get same output file(spark-class-launcher-output-%RANDOM%.txt) for apps. Then, the following app would hit the issue when it tries to write the same file which has already been opened for writing by another app.

We should make sure to generate unique output file for SparkLauncher on Windows to avoid this issue.

## How was this patch tested?

Tested manually on Windows.

Closes apache#25076 from Ngone51/SPARK-28302.

Authored-by: wuyi <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 925f620)
Signed-off-by: HyukjinKwon <[email protected]>
@casagrande-stefano
Copy link

casagrande-stefano commented May 26, 2020

The correct way is modify spark-class2.cmd.
I changed the row:
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt

with:
rem set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
SET RND=
for /f "delims=" %%F in ('powershell Get-Random') do (SET "RND=%%F")
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RND%.txt

and the system works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants