Skip to content

Conversation

@hiyouga
Copy link
Owner

@hiyouga hiyouga commented Sep 13, 2025

The PR huggingface/transformers#40490 breaks training when using normal 3D position ids with a shape of (3, batch_size, seq_len). This is because the text position ids are discarded, causing attention to be computed across samples.

@hiyouga hiyouga force-pushed the yaowei/fix_position_id branch 2 times, most recently from ecb0493 to 552f763 Compare September 13, 2025 17:34
@hiyouga hiyouga force-pushed the yaowei/fix_position_id branch from 552f763 to e9065e7 Compare September 13, 2025 17:37
@hiyouga hiyouga merged commit 15aefea into main Sep 13, 2025
1 check passed
@hiyouga hiyouga deleted the yaowei/fix_position_id branch September 13, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants