Skip to content

Simplify vllm batch#345

Merged
sahel-sh merged 6 commits intomainfrom
fix_vllm_batch
Feb 24, 2026
Merged

Simplify vllm batch#345
sahel-sh merged 6 commits intomainfrom
fix_vllm_batch

Conversation

@sahel-sh
Copy link
Member

@sahel-sh sahel-sh commented Feb 24, 2026

Mostly agentic coding, both vLLM inference modes are tested via demos, no impact on Gemini and GPT rankers.
I will add tests for sglang and tensorrt in a follow up PR if we decide to keep them.

Before this change we would create the prompts for all requests, then process the sliding window for all requests in parallel. This meant every single request had to complete the ranking of the current window before sending the next batch. For thinking models, only a single harder query is enough to keep everyone waiting, lots of idle time.

This cl does the prompt creation, llm inference, permutation update and enqueing of the next window for each request individually, the batch_size controls the number of on the fly requests to vllm inference handlers.

Pull Request Checklist

Reference Issue

Please provide the reference to issue this PR is addressing (# followed by the issue number). If there is no associated issue, write "N/A".

ref:

Checklist Items

Before submitting your pull request, please review these items:

  • Have you followed the contributing guidelines?
  • Have you verified that there are no existing Pull Requests for the same update/change?
  • Have you updated any relevant documentation or added new tests where needed?

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no API changes)
  • Documentation content changes
  • Reproduction logs
  • Other...
    • Description:

@sahel-sh sahel-sh requested a review from clides February 24, 2026 02:00
@sahel-sh sahel-sh requested review from lilyjge February 24, 2026 16:53
@sahel-sh sahel-sh merged commit 94d8c51 into main Feb 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants