Skip to content
Merged

OPT-fix #17229

Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tests/models/opt/test_modeling_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ def test_inference_no_head(self):
attention_mask = input_ids.ne(model.config.pad_token_id)
with torch.no_grad():
output = model(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state
expected_shape = torch.Size((1, 11, 1024))
expected_shape = torch.Size((1, 11, 512))
Copy link
Collaborator

@ydshieh ydshieh May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand very well why it is expected to be 512.

Fomr config, I see

 "hidden_size": 1024,
 "word_embed_proj_dim": 512,

So any special reason for last_hidden_state to be 512? It looks unusual if it is intended to be so - therefore a comment would be very helpful in this casee.

Copy link
Contributor Author

@younesbelkada younesbelkada May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, for some particular cases in OPT - the word embeddings have a different size than the dimension of the internal hidden states. That is why in the config file you have the word_embed_proj_dim arg.
So basically the hidden states that are computed here including the last hidden states will have a dimension of word_embed_proj_dim. The conversion from hidden_size to word_embed_proj_dim is ensured by the self.project_out and self.project_in modules. Basically this happens only when hidden_size is different than word_embed_proj_dim which is the case in OPT-350m.

Copy link
Collaborator

@ydshieh ydshieh May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @younesbelkada , I understand very well now 😄. I still think it would be a very good idea to add a (short) comment about what you mentioned above.

Regarding this, I personally think it would be much better to have something like:

@dataclass
class OPTModelOutput(ModelOutput):

    last_hidden_state  = None
    projected_word_embedding = None  # or whatever better name
   ... (remaining fields )

and use this for OPTDecoder/OPTModel.

I will leave @patrickvonplaten and @LysandreJik to judge though, they have much more experience than me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younesbelkada is right here I think :-) But always good to double check!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I still expect a comment to be included in the code 😢

self.assertEqual(output.shape, expected_shape)
expected_slice = torch.tensor(
[[0.7144, 0.8143, -1.2813], [0.7144, 0.8143, -1.2813], [-0.0467, 2.5911, -2.1845]], device=torch_device
Expand All @@ -306,7 +306,7 @@ def test_load_model(self):
def test_logits(self):
model = OPTForCausalLM.from_pretrained(self.path_model)
model = model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
tokenizer = GPT2Tokenizer.from_pretrained(self.path_model)
tokenizer.add_special_tokens({"pad_token": "<pad>"})

prompts = [
Expand Down