mean_resizing = True does not work with mixed/meta initialization #3719

jdchang1 · 2024-11-20T19:50:01Z

What does this PR do?

Transformers recently added in mean_resizing to resize_token_embeddings. This is breaking with mixed initialization in downstream training tasks that requires adding tokens to Composer Huggingface Models. This PR sets this value to False for now rather than defaulting to True.

dakinggg · 2024-11-20T20:10:13Z

composer/models/huggingface.py

                    f' Resizing the model embeddings to {len(self.tokenizer)} from {self.config.vocab_size}.',
                )
-                self.model.resize_token_embeddings(len(self.tokenizer))
+                self.model.resize_token_embeddings(len(self.tokenizer), mean_resizing=False)


you'll need to gate on transformers version, or inspect the args to the func before passing this in or something i think

dakinggg

For posterity/if we want to fix this, could you explain why the mean_resizing doesn't work with meta but the old version does?

mvpatel2000

LGTM besides Daniel's comment which will fix tests

mean_resizing = True does not work with mixed/meta initialization

e0fe274

jdchang1 requested a review from a team as a code owner November 20, 2024 19:50

jdchang1 requested a review from mvpatel2000 November 20, 2024 19:50

dakinggg reviewed Nov 20, 2024

View reviewed changes

Merge branch 'main' into main

837bb4c

mvpatel2000 reviewed Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mean_resizing = True does not work with mixed/meta initialization #3719

mean_resizing = True does not work with mixed/meta initialization #3719

Uh oh!

jdchang1 commented Nov 20, 2024

Uh oh!

dakinggg Nov 20, 2024

Uh oh!

dakinggg left a comment

Uh oh!

mvpatel2000 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mean_resizing = True does not work with mixed/meta initialization #3719

Are you sure you want to change the base?

mean_resizing = True does not work with mixed/meta initialization #3719

Uh oh!

Conversation

jdchang1 commented Nov 20, 2024

What does this PR do?

Uh oh!

dakinggg Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

dakinggg left a comment

Choose a reason for hiding this comment

Uh oh!

mvpatel2000 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants