Skip to content

[Bug]: MM embedding getting computed for each slide of token for each chunk #895

@gkumbhat

Description

@gkumbhat

Describe the bug

Description

In _prepare_chunked_prefill function, we are slicing the input embeds based on chunk size, id etc. However, for MM input, we are recomputing it using mm_features and slice of the input embeds corresponding to the chunk, and not the full sequence. This results in image features getting scattered into wrong positions, particularly the last chunks getting messed up.

How to reproduce

This doesn't create a crash or anything, but would affect quality.

I noticed quality issues with ministral3 model, which resulted in the discovery of this issue.

Additional context

No response

Checklist

  • I have searched for similar issues.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions