Update num_proc handling for vllm and Ray mode by ArdalanM · Pull Request #973 · datajuicer/data-juicer

ArdalanM · 2026-05-01T22:46:03Z

Summary

TextTaggingByPromptMapper was unconditionally setting self.num_proc = 1, which in a
Ray+vLLM run capped the actor pool to a single GPU. The only available multi-GPU strategy
was tensor parallelism (tensor_parallel_size=N); data parallelism was silently broken.

This fix makes num_proc = 1 conditional on the execution backend, so Ray can now schedule
one vLLM actor per GPU — enabling both data parallelism, tensor parallelism, or a
combination of both.

Adjust num_proc initialization based on vllm and Ray mode.

gemini-code-assist

Code Review

This pull request modifies the initialization of the TextTaggingByPromptMapper to conditionally set the number of processes based on whether vLLM and Ray mode are enabled. The goal is to allow Ray to manage data parallelism by scheduling actors per GPU. A review comment suggests that the current implementation unnecessarily restricts parallelism for the HuggingFace backend when running on Ray and provides a suggestion to simplify the logic to check only for Ray mode.

Update num_proc handling for vllm and Ray mode

c6a5dbb

Adjust num_proc initialization based on vllm and Ray mode.

ArdalanM requested a deployment to Testing May 1, 2026 22:46 — with GitHub Actions Waiting

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread data_juicer/ops/mapper/text_tagging_by_prompt_mapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update num_proc handling for vllm and Ray mode#973

Update num_proc handling for vllm and Ray mode#973
ArdalanM wants to merge 1 commit into
datajuicer:mainfrom
ArdalanM:patch-1

ArdalanM commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArdalanM commented May 1, 2026

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant