-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description of the problem / feature request:
Executing a batch script with ctx.actions.run fails when running via RBE on Windows. Running the exact same build locally succeeds without a hitch. It appears Bazel may be passing an Unix style path rather than a Windows path when executing the rule action in RBE. From the linked minimal repro, we see:
ERROR: C:/source/BUILD:7:1: Couldn't build file output_regular_path.txt: BatchExecuteWithRegularPath output_regular_path.txt failed (Exit 1): script_regular_path.bat failed: error executing command
cd C:/_eb/execroot/ctx_actions_run_rbe
SET SOME_ENV_VAR=some_value
bazel-out/x64_windows-fastbuild/bin/script_regular_path.bat
Execution platform: @rbe_windows_msvc_cl//config:platform
'bazel-out' is not recognized as an internal or external command,
operable program or batch file.
ERROR: C:/source/BUILD:3:1: Couldn't build file output_windows_path.txt: BatchExecuteWithWindowsPath output_windows_path.txt failed (Exit 1): script_windows_path.bat failed: error executing command
cd C:/_eb/execroot/ctx_actions_run_rbe
SET SOME_ENV_VAR=some_value
bazel-out/x64_windows-fastbuild/bin/script_windows_path.bat
Execution platform: @rbe_windows_msvc_cl//config:platform
'bazel-out' is not recognized as an internal or external command,
operable program or batch file.
which appears that cmd.exe is interpreting bazel-out as the command to run and not the full path. This does not occur when running the rule locally. When we purposefully use a Windows path string for the executable field of the rule action, it makes no difference, we get the same result as when we use a File type.
This issue could possibly be fixed by performing the appropriate quoting/passing the correct flags to cmd.exe or ensuring a Windows style path is always used.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
See: https://github.com/greenhouse-org/bazel-issue-repro/tree/master/ctx_actions_run_rbe
The README in this repo should have details for reproducing the problem and example failure output.
What operating system are you running Bazel on?
Windows
What's the output of bazel info release?
release 3.1.0
Any other information, logs, or outputs that you want to share?
We ran into this in the Envoy project as a result of bumping to the latest rules_go which switched from running go tool commands via bash shell to the native ctx.actions.run on Windows.
Using the --experimental_remote_grpc_log flag to generate the grpc log for a failing RBE build and the remote client tool we are able to see the failed and successful actions and representation of the commands that are run. It appears from the output below that Bazel is sending a command to the RBE service that uses an invalid path for a batch script.
I believe the invocation ID from this run was 8d806e2e-c3cb-4a69-becb-05571c76f75c, (but I may be mixing up attempts).
One of the failed actions from the issue repro (digest 89a618e0b38a6c9967729316db828b66f4a5e7d7764c514bcd83e87b9b32c8d5/141):
$ ./bazel-bin/remote_client $REMOTE_CLIENT_FLAGS --grpc_log=$PWD/grpc.log show_action --digest=89a618e0b38a6c9967729316db828b66f4a5e7d7764c514bcd83e87b9b32c8d5/141
Command [digest: e51efdab6899ffd275023f75f791bd380375d55cdab095f1ee81cb01c7305615/312]:
SOME_ENV_VAR=some_value \
bazel-out/x64_windows-fastbuild/bin/script_regular_path.bat
Input files [total: 1, root Directory digest: 47f9f37722730278c0a8568b885c47f0a0fbb36926d0e2712bf1fefe1420c259/83]:
bazel-out [Directory digest: 65b4e6b8655301599641f88388ca0cdc2a06b7b1c91efd2984ef1b1039889e6e/95]
bazel-out/x64_windows-fastbuild [Directory digest: f20d65b30f1cca9b088733264a48e41dfd8884e251b3896a34b98a64ca935617/77]
bazel-out/x64_windows-fastbuild/bin [Directory digest: 357fe978c27bd53d468f1e982c8567691a8cc46a3010647af5597f3d568d6617/99]
bazel-out/x64_windows-fastbuild/bin/script_regular_path.bat [File content digest: 621fd3ac7095509c3d715705c884365ac4aa812e47cd10c5de77576dc5bc9461/70]
Output files:
bazel-out/x64_windows-fastbuild/bin/output_regular_path.txt
Output directories:
(none)
Platform:
properties {
name: "OSFamily"
value: "Windows"
}
properties {
name: "container-image"
value: "docker://gcr.io/envoy-ci/envoy-build-windows@sha256:02d4ff5c2e4c703944e4ec3770c5fa51cdfc6781f95607e91648e19c14b38346"
}
Another failed action looks very similar (digest 27e66ea85ab4978fb639e57e30206b8e5a598bade43746e55c46250e42c6d58d/141).
The successful action from the issue repro (digest 4306f162f55bdc0549d4d986d63ce2985e7a647b3c92e8f9fdef3b4b9ef903ad/139):
$ ./bazel-bin/remote_client $REMOTE_CLIENT_FLAGS --grpc_log=$PWD/grpc.log show_action --digest=4306f162f55bdc0549d4d986d63ce2985e7a647b3c92e8f9fdef3b4b9ef903ad/139
Command [digest: 95f4252117f582c920e0f5eb2fb9cc916468ae79d27d30facf011cef9c245da1/417]:
SOME_ENV_VAR=some_value \
cmd.exe /S /C '(echo %cd% & echo --- & dir & echo --- & dir C:\ & echo --- & set) > bazel-out\x64_windows-fastbuild\bin\output_execution_environment.txt'
Input files [total: 0, root Directory digest: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/0]:
Output files:
bazel-out/x64_windows-fastbuild/bin/output_execution_environment.txt
Output directories:
(none)
Platform:
properties {
name: "OSFamily"
value: "Windows"
}
properties {
name: "container-image"
value: "docker://gcr.io/envoy-ci/envoy-build-windows@sha256:02d4ff5c2e4c703944e4ec3770c5fa51cdfc6781f95607e91648e19c14b38346"
}
Commenting out the inputs field of the //:batch_execute_rule_windows_path rule, running locally the build fails with:
> bazel --output_base=C:/_eb build --verbose_failures --keep_going //:batch_execute_rule_windows_path
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:batch_execute_rule_windows_path (4 packages loaded, 6 targets configured).
INFO: Found 1 target...
ERROR: C:/source/BUILD:3:1: Couldn't build file output_windows_path.txt: BatchExecuteWithWindowsPath output_windows_path.txt failed (Exit -1): script_windows_path.bat failed: error executing command
cd C:/_eb/execroot/ctx_actions_run_rbe
SET SOME_ENV_VAR=some_value
bazel-out/x64_windows-fastbuild/bin/script_windows_path.bat
Execution platform: @local_config_platform//:host. Note: Remote connection/protocol failed with: execution failed
Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(199): CreateProcessW("C:\_eb\execroot\ctx_actions_run_rbe\bazel-out\x64_windows-fastbuild\bin\script_windows_path.bat"): The system cannot find the file specified.
(error: 2)
Target //:batch_execute_rule_windows_path failed to build
It somewhat makes sense, there is not file input to the rule and the path is converted properly, however the fact that Bazel is trying to run a .bat file with CreateProcessW is a bit odd. The exact same error is output when use an unaltered path string instead of a File type (without replacing / with \\) for the executable.
Running the same thing remotely, the build fails with the stack trace and error:
ERROR: C:/source/BUILD:3:1: Couldn't build file output_windows_path.txt: BatchExecuteWithWindowsPath output_windows_path.txt failed (Exit 34): java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: docker: Error response from daemo
n: container ebc98b3709257313a06f0c966af348bb1fdeba4baefaede90810ee34fd62bf31 encountered an error during CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2) extra info: {"CommandLine":"bazel-out/x64_windows-fastbuild/bin/script_win
dows_path.bat","WorkingDirectory":"C:\\botcode\\w","Environment":{"HOST_CONTAINER_NAME":"rbe-container-34d0f566-c928-4b66-b765-361a4a5bd7c6","MSYS2_ARG_CONV_EXCL":"*","SOME_ENV_VAR":"some_value","TEMP":"C:\\Windows\\Temp","TMP":"C:\\Windows\\Temp","TMPDIR":"C:\\Windows\\Temp"},
"CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}.
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:192)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$0(RemoteSpawnRunner.java:324)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:237)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:304)
at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:238)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:126)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:96)
at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:39)
at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:327)
at com.google.devtools.build.lib.actions.Action.execute(Action.java:124)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:961)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1109)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1080)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:137)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:80)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:601)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:907)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:297)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:438)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: docker: Error response from daemon: container ebc98b3709257313a06f0c966af348bb1fdeba4baefaede90810ee34fd62bf31 encountered an error during CreateProcess: failure in a Windows system cal$
ONV_EXCL":"*","SOME_ENV_VAR":"some_value","TEMP":"C:\\Windows\\Temp","TMP":"C:\\Windows\\Temp","TMPDIR":"C:\\Windows\\Temp"},"CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}.
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.handleStatus(GrpcRemoteExecutor.java:69)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:81)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$0(GrpcRemoteExecutor.java:155)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:237)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:134)
... 24 more
It seems something along the way isn’t converting to the executable path to a Windows path when we’re executing remotely, though the actual working directory seems to be set appropriately for the remote container environment.