[trainer] `model` argument is not the same depending on n_gpus

Extracting the discussion from https://github.com/huggingface/transformers/pull/8716

Summary of the issue: `prediction_step()` has a `model` argument which is a normal model with n_gpu < 2, and a wrapped DataParallel model with n_gpu > 1. So the API suffers from ambiguity here.

The user has to really use `self.model` to be able to call methods like `model.config()` or `model.generate()`, which can't be called on the wrapped model. But it's very likely they will use `model` instead since it'll act like `self.model` unless under multi_gpu. And why do we even have that `model` argument then?

Possible solutions discussed:

1. monkeypatch `torch.nn.DataParallel` to expand its API to support all the methods of the original model transparently by installing a catch all `__getattr__` and remap all the failed method look ups to delegate to `self.module`.

2. not to call the function argument `model` anymore, since it isn't under multi gpu, but is something else. 

3. remove the `model` argument completely + document to always use `self.model` - currently in `seq2seq_trainer.py `once we switch to `self.model`, `prediction_step()` no longer needs `model` as an argument (but is it always the case?)

4. pass `self.model `as the `model` arg, and making the wrapped model available via `self.wrapped_model` if the user needs it.

Summary of discussion around proposed solutions:

1. too magical

2. proposed calling it `wrapped_model`, but it's just as confusing since most of the time it's not.

3. need to check whether wrapped model is every needed inside user functions.

4. was not discussed yet

@sgugger, @LysandreJik 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] `model` argument is not the same depending on n_gpus #8736

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[trainer] model argument is not the same depending on n_gpus #8736

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[trainer] `model` argument is not the same depending on n_gpus #8736