Skip to content

Conversation

@HsiaoTsan
Copy link

@HsiaoTsan HsiaoTsan commented Nov 28, 2025

Description

Fixes issue: OverflowError in vLLM server port allocation when using high data parallelism (e.g., allocation_mode=vllm:d12t1).

This update corrects incorrect port-range calculations that occurred in multi-node environments due to the use of global GPU indices instead of node-local indices. The resulting overflow produced port values above 65535.

This implementation includes:

  • Correct server_idx_offset computation using modulo to ensure node-local indexing
  • Prevents port overflows when data parallelism exceeds the number of GPUs on a single node
  • Adds comprehensive test coverage across multi-node and multi-parallelism setups
  • Validates edge cases including partial GPU visibility
  • Updates port-allocation logic in vllm_server.py with minimal, backward-compatible changes

The key difference from the prior behavior is that port ranges are now computed per-node rather than globally, eliminating invalid port assignments such as [65000–70000].

Related Issue

Fixes #652
Addresses overflow errors observed when running allocation_mode=vllm:d12p1t1+d4p1t1 on 2×8-GPU clusters.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature
  • Breaking change
  • Documentation update (inline comments + docstrings)
  • Code refactoring
  • Test coverage improvement

Implementation Details

Core Change

  • File: areal/launcher/vllm_server.py
  • Modification: Added % n_servers_per_node to server_idx_offset so that indices wrap correctly on multi-node setups
  • Diff Size: +3 / −1 lines

Example

Before (incorrect global indexing):
Produced invalid port ranges such as (65000, 70000).

After (node-local indexing):
Ports stay within valid 0–65535 range.

Tests

  • File: areal/tests/test_vllm_server_launcher.py
  • 20 unit tests added, including:
    • Port allocation across d1–d16, t1–t8
    • Multi-node configurations (up to 5 nodes)
    • Boundary and overflow scenarios
    • Key test: test_high_data_parallelism_d12_no_overflow

All tests pass:

pytest areal/tests/test_vllm_server_launcher.py -v
======================== 20 passed in 0.19s ========================

Validated configurations:

  • vllm:d12t1 (previously failing)
  • vllm:d16t1
  • Various tensor-parallelism modes (t2, t4, t8)
  • Partial GPU visibility (CUDA_VISIBLE_DEVICES)

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools
  • All relevant unit tests pass
  • Tests added for new behavior
  • Documentation updated where appropriate
  • Branch up to date with main
  • No breaking changes introduced
  • No critical issues raised by AI reviewers

Breaking Change Details

Not applicable — this change is fully backward compatible.

Additional Context

This fix ensures stable port allocation for high data-parallel configurations in multi-node environments, resolving prior failures caused by exceeding valid port ranges.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @HsiaoTsan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical OverflowError in the vLLM server's port allocation logic, which previously manifested under high data parallelism in multi-node setups. The issue was traced to an incorrect global indexing scheme for server offsets, causing calculated port ranges to exceed system limits. The solution involves a precise adjustment to the server index calculation, ensuring it remains node-local and within valid port boundaries. The change is thoroughly validated by a new suite of unit tests, confirming robust and correct port assignment across diverse configurations.

Highlights

  • Bug Fix: Port Overflow: Addresses and resolves an OverflowError that occurred in the vLLM server's port allocation when operating with high data parallelism (e.g., allocation_mode=vllm:d12t1) in multi-node environments. Previously, port ranges could exceed the valid 0-65535 range.
  • Root Cause Identified: The problem stemmed from the server_idx_offset calculation incorrectly using global GPU indices instead of node-local indices, leading to port range calculations that went beyond the maximum allowed port number.
  • Solution Implemented: A modulo operation (% n_servers_per_node) has been added to the server_idx_offset calculation in areal/launcher/vllm_server.py. This ensures that the server index offset is always node-local, preventing port overflows.
  • Comprehensive Testing: 20 new unit tests have been introduced in areal/tests/test_vllm_server_launcher.py. These tests cover various port allocation configurations, multi-node scenarios (up to 5 nodes), edge cases, and specifically validate the fix for the d12 data parallelism overflow.
  • Backward Compatibility: The implemented fix is fully backward compatible, ensuring that existing configurations continue to function without any changes or regressions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical port overflow issue in multi-node configurations by correctly calculating node-local server indices. The fix is concise and well-commented. The addition of a comprehensive suite of 20 unit tests is particularly commendable, as it thoroughly validates the port allocation logic across various scenarios, including the specific bug case, multi-node setups, and different parallelism configurations. My review includes a few minor suggestions to improve code style and consistency.

- Refactor multi-line assertions to if-raise pattern to resolve black/ruff conflicts
- Remove unused variables in test file (gpus_per_server, ports_per_server)
- Apply pre-commit formatting fixes (trailing whitespace, markdown formatting)
- All files now pass both black and ruff format checks
@HsiaoTsan
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical port overflow issue that occurred in multi-node environments with high data parallelism. The fix, which correctly calculates a node-local server index using the modulo operator, is sound and well-targeted. The addition of a comprehensive test suite is excellent, as it covers numerous scenarios and ensures the stability of the port allocation logic, preventing future regressions. I've included a few minor suggestions to improve code style in the implementation and tests, and to correct a small typo in the documentation.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical port overflow issue in multi-node, high data parallelism scenarios by correctly calculating node-local server indices. The fix is simple, well-commented, and robustly supported by an extensive new test suite that covers the specific bug, various configurations, and edge cases. The addition of these tests is a significant improvement. I've included a couple of minor suggestions to improve test code style and fix a documentation typo. Overall, this is an excellent contribution.

@HsiaoTsan
Copy link
Author

@garrett4wade @xssstory Hi maintainers, could you please check the workflow approval? Thanks.

Copy link
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HsiaoTsan , sorry for the late reply. LGTM but with a minor comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Port overflow error with high data parallelism in vLLM server

3 participants