Skip to content

Commit f4a85d8

Browse files
ruisearch42lk-chen
authored andcommitted
[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout (vllm-project#15301)
Signed-off-by: Rui Qiao <[email protected]>
1 parent 55497dc commit f4a85d8

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

vllm/executor/ray_distributed_executor.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -561,6 +561,15 @@ def _compiled_ray_dag(self, enable_asyncio: bool):
561561
envs.VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL)
562562
logger.info("VLLM_USE_RAY_COMPILED_DAG_OVERLAP_COMM = %s",
563563
envs.VLLM_USE_RAY_COMPILED_DAG_OVERLAP_COMM)
564+
# Enlarge the default value of "RAY_CGRAPH_get_timeout" to 300 seconds
565+
# (it is 10 seconds by default). This is a Ray environment variable to
566+
# control the timeout of getting result from a compiled graph execution,
567+
# i.e., the distributed execution that includes model forward runs and
568+
# intermediate tensor communications, in the case of vllm.
569+
os.environ.setdefault("RAY_CGRAPH_get_timeout", "300") # noqa: SIM112
570+
logger.info("RAY_CGRAPH_get_timeout is set to %s",
571+
os.environ["RAY_CGRAPH_get_timeout"]) # noqa: SIM112
572+
564573
with InputNode() as input_data:
565574
# Example DAG: PP=2, TP=4
566575
#

0 commit comments

Comments
 (0)