Update num_proc handling for vllm and Ray mode#973
Open
ArdalanM wants to merge 1 commit into
Open
Conversation
Adjust num_proc initialization based on vllm and Ray mode.
Contributor
There was a problem hiding this comment.
Code Review
This pull request modifies the initialization of the TextTaggingByPromptMapper to conditionally set the number of processes based on whether vLLM and Ray mode are enabled. The goal is to allow Ray to manage data parallelism by scheduling actors per GPU. A review comment suggests that the current implementation unnecessarily restricts parallelism for the HuggingFace backend when running on Ray and provides a suggestion to simplify the logic to check only for Ray mode.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TextTaggingByPromptMapperwas unconditionally settingself.num_proc = 1, which in aRay+vLLM run capped the actor pool to a single GPU. The only available multi-GPU strategy
was tensor parallelism (
tensor_parallel_size=N); data parallelism was silently broken.This fix makes
num_proc = 1conditional on the execution backend, so Ray can now scheduleone vLLM actor per GPU — enabling both data parallelism, tensor parallelism, or a
combination of both.