[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403

imkero · 2024-11-17T06:32:14Z

mrope_position_delta is a sequence-level constant, and is calculated based on the prompt only, no matter the prefilling is completed or not.

However, the modification I made in #10388 will cause mrope_position_delta change while prefilling is partial completed (possibly when chunked prefill is enabled). This PR will fix it.

Why prev PR's tests still passed

This side effect does nothing bad because mrope_position_delta is actually used only in the decoding stage. So the incorrect mrope_position_delta (produced while chunked prefilling not completed) takes no effect, and leaves the tests passed.

I think we should still correct it to prevent us from being confused about mrope_position_delta.

Explanation

Qwen2-VL's official impl does not include chunked prefill, so I will try to prove my conclusion (mrope_position_delta is a sequence-level constant) here.

What is `mrope_position_delta`

According to Qwen2-VL's huggingface impl, mrope_position_delta (or rope_delta)'s definition is "The rope index difference between sequence length and multimodal rope."

https://github.com/huggingface/transformers/blob/913330ca9f80b0a308d7490a02274b01b51e6051/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L95-L96

mrope_position_delta is used to calculate generated tokens' mrope_position (in the decoding stage)

How `mrope_position_delta` works

Let's borrow an example from Qwen2-VL's technical report

Assume a 38 tokens prompt: 36 image tokens + 2 text tokens (This, video)
And consider other following text tokens (features, a, ...) as generated tokens

The mrope_position_ids of this sequence is:

While calculating generated tokens' mrope_position, there is a short hand:

mrope_position_delta = torch.max(position_ids_of_prompt).item() - len(prompt_tokens) + 1 
# mrope_position_delta = 5 - 38 + 1 = -32 in this example

# for the first generated token "features" (token_offset_in_seq = 38), its mrope_position_id is
position_scalar = token_offset_in_seq + mrope_position_delta
mrope_position_id = (position_scalar, position_scalar, position_scalar)
# (6, 6, 6)

Conclusion

mrope_position_delta should always be calculated with the whole prompt (take the prompt len into consideration) and whole position list (get the max position scalar).

Signed-off-by: imkero <[email protected]>

github-actions · 2024-11-17T06:32:26Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-11-17T06:55:25Z

Thanks for fixing!

…ject#10403) Signed-off-by: imkero <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

…ject#10403) Signed-off-by: imkero <[email protected]>

[Bugfix] Fix mrope_position_delta in non-last prefill chunk

7e2e25a

Signed-off-by: imkero <[email protected]>

DarkLight1337 approved these changes Nov 17, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 17, 2024 06:55

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2024

DarkLight1337 merged commit 80d85c5 into vllm-project:main Nov 17, 2024
68 checks passed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 18, 2024

[Bugfix] Fix mrope_position_delta in non-last prefill chunk (vllm-pro…

871a773

…ject#10403) Signed-off-by: imkero <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

prashantgupta24 pushed a commit to opendatahub-io/vllm that referenced this pull request Dec 3, 2024

[Bugfix] Fix mrope_position_delta in non-last prefill chunk (vllm-pro…

dcccc62

…ject#10403) Signed-off-by: imkero <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix] Fix mrope_position_delta in non-last prefill chunk (vllm-pro…

793c404

…ject#10403) Signed-off-by: imkero <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403

[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403

Uh oh!

imkero commented Nov 17, 2024 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 17, 2024

Uh oh!

DarkLight1337 commented Nov 17, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403

[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403

Uh oh!

Conversation

imkero commented Nov 17, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why prev PR's tests still passed

Explanation

What is mrope_position_delta

How mrope_position_delta works

Conclusion

Uh oh!

github-actions bot commented Nov 17, 2024

Uh oh!

DarkLight1337 commented Nov 17, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imkero commented Nov 17, 2024 •

edited by github-actions bot

Loading

What is `mrope_position_delta`

How `mrope_position_delta` works