[trainer] a few fixes#9993
Conversation
|
This is breaking sadly: with this change someone using Now for deepspeed I understand why this would be necessary, so we can move the |
|
oh, that's no problem for now. Let's do it just for deepspeed then. Fairscale might join down the road. Actually Deepspeed doesn't even need the So basically this skipping |
sgugger
left a comment
There was a problem hiding this comment.
Better this way, thanks for adapting!
| model = ShardedDDP(model, self.optimizer) | ||
| elif is_sagemaker_distributed_available(): | ||
| model = DDP(model, device_ids=[dist.get_local_rank()], broadcast_buffers=False) | ||
| if self.deepspeed: |
There was a problem hiding this comment.
FYI this breaks most integrations, it should be an elif so that we don't fall into the branches after if TPU or sagemaker is here.
Will fix in a commit on master.
There was a problem hiding this comment.
oh boy, my apologies, my branching skills went haywire yesterday.
just the fact that one puts an if foo really close to an existing set of conditionals doesn't make it part of it. need a different programming language that will be more do-what-i-mean-when-i-am-tired
There was a problem hiding this comment.
No worries, just wanted to alert you :-) Thankfully we found this just before cutting the release candidate!
There was a problem hiding this comment.
Oh my!
As I said above this literally happened to me several times yesterday, something went haywire and I started adding new branches with just if's adjacent to an existing if/elif/else pile - my brain decided that if they are together it's must be part of the other if/else. So odd. Some new programming language must be percolating through my neurons or a rogue AI took over and is using my brain for its experiments.
This PR:
model.to(device)- it's not needed for DeepSpeed. but primarily this allows loading models that otherwise won't load - e.g. loading 45GB (fp32) to a 40GB GPU when using Deepspeed with fp16 - as it loads only 22GB of it. But currently we load all 45GB right away and well nothing worksThis PR enables t5-11b training on 1x 40GB gpu w/ Deepspeed #9996
@sgugger