[Bugfix] Fix mrope_position_delta in non-last prefill chunk #10403
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
mrope_position_deltais a sequence-level constant, and is calculated based on the prompt only, no matter the prefilling is completed or not.However, the modification I made in #10388 will cause
mrope_position_deltachange while prefilling is partial completed (possibly when chunked prefill is enabled). This PR will fix it.Why prev PR's tests still passed
This side effect does nothing bad because
mrope_position_deltais actually used only in the decoding stage. So the incorrectmrope_position_delta(produced while chunked prefilling not completed) takes no effect, and leaves the tests passed.I think we should still correct it to prevent us from being confused about
mrope_position_delta.Explanation
What is
mrope_position_deltamrope_position_delta(orrope_delta)'s definition is "The rope index difference between sequence length and multimodal rope."https://github.com/huggingface/transformers/blob/913330ca9f80b0a308d7490a02274b01b51e6051/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L95-L96
mrope_position_deltais used to calculate generated tokens' mrope_position (in the decoding stage)How
mrope_position_deltaworksLet's borrow an example from Qwen2-VL's technical report
This,video)features,a, ...) as generated tokensThe
mrope_position_idsof this sequence is:While calculating generated tokens' mrope_position, there is a short hand:
Conclusion
mrope_position_deltashould always be calculated with the whole prompt (take the prompt len into consideration) and whole position list (get the max position scalar).