Skip to content

Conversation

@imkero
Copy link
Contributor

@imkero imkero commented Nov 17, 2024

mrope_position_delta is a sequence-level constant, and is calculated based on the prompt only, no matter the prefilling is completed or not.

However, the modification I made in #10388 will cause mrope_position_delta change while prefilling is partial completed (possibly when chunked prefill is enabled). This PR will fix it.

Why prev PR's tests still passed

This side effect does nothing bad because mrope_position_delta is actually used only in the decoding stage. So the incorrect mrope_position_delta (produced while chunked prefilling not completed) takes no effect, and leaves the tests passed.

I think we should still correct it to prevent us from being confused about mrope_position_delta.

Explanation

Qwen2-VL's official impl does not include chunked prefill, so I will try to prove my conclusion (mrope_position_delta is a sequence-level constant) here.

What is mrope_position_delta

  1. According to Qwen2-VL's huggingface impl, mrope_position_delta (or rope_delta)'s definition is "The rope index difference between sequence length and multimodal rope."

https://github.com/huggingface/transformers/blob/913330ca9f80b0a308d7490a02274b01b51e6051/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L95-L96

  1. mrope_position_delta is used to calculate generated tokens' mrope_position (in the decoding stage)

How mrope_position_delta works

Let's borrow an example from Qwen2-VL's technical report

image

  • Assume a 38 tokens prompt: 36 image tokens + 2 text tokens (This, video)
  • And consider other following text tokens (features, a, ...) as generated tokens

The mrope_position_ids of this sequence is:

image

While calculating generated tokens' mrope_position, there is a short hand:

mrope_position_delta = torch.max(position_ids_of_prompt).item() - len(prompt_tokens) + 1 
# mrope_position_delta = 5 - 38 + 1 = -32 in this example

# for the first generated token "features" (token_offset_in_seq = 38), its mrope_position_id is
position_scalar = token_offset_in_seq + mrope_position_delta
mrope_position_id = (position_scalar, position_scalar, position_scalar)
# (6, 6, 6)

Conclusion

mrope_position_delta should always be calculated with the whole prompt (take the prompt len into consideration) and whole position list (get the max position scalar).

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

Thanks for fixing!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 17, 2024 06:55
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2024
@DarkLight1337 DarkLight1337 merged commit 80d85c5 into vllm-project:main Nov 17, 2024
68 checks passed
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 18, 2024
prashantgupta24 pushed a commit to opendatahub-io/vllm that referenced this pull request Dec 3, 2024
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants