Replies: 13 comments 1 reply
-
|
Unsloth doesn't support full finetune though. |
Beta Was this translation helpful? Give feedback.
-
|
@adamo1139 May I know a bit more? It seems that I have been using unsloth for full finetune for many times. |
Beta Was this translation helpful? Give feedback.
-
|
I've tried to see whether Daniel commented on it recently and it looks like it officially isn't supported but might kinda work anyway. I just assumed it didn't work since that was the communication I saw. |
Beta Was this translation helpful? Give feedback.
-
|
Ah! I never knew that, thanks for pointing out! |
Beta Was this translation helpful? Give feedback.
-
|
We had a discussion on Discord - it's possible to do CPU offloading, but in a smart way - ie offload 1/2 of the layers, and bring the other 1/2 back dynamically. This can hide all communication and cut VRAM usage by 50% - it's more of an engineering challenge though to make it work sadly |
Beta Was this translation helpful? Give feedback.
-
|
Looks reasonable! Sadly to hear it requires much engineering... |
Beta Was this translation helpful? Give feedback.
-
|
Ye unfortunately the goal is to not make it slower - the dumbest solution is dynamic offloading ie offload everything if it doesnt fit, then bring it in slowly - this is not a good idea sadly |
Beta Was this translation helpful? Give feedback.
-
|
@danielhanchen I thought about it a bit more, and here is my brainstorm: Firstly, DeepSpeed does support to finetune in such scenario (7B param). According to its estimation, we need 15G gpu memory (good) and 158G cpu memory (well...), if we only offload optimizer to cpu (and not offload param to cpu). Now the problem becomes whether we can reduce CPU memory requirement. My naive thought is that, there is adamw_8bit (though it does not support CPU yet, but this seems to be an engineering problem instead of research problem), so maybe we can reduce the 8byte/param to 2byte/param for optimizer states. There may be other optimizations available, since the scenario is single-GPU, thus reducing some artifacts caused by multi-GPU setup. And theoretically speaking, since 1.5B model needs 14GB memory (I tested), then it seems 7B should need 65GB memory in total, which is acceptable. |
Beta Was this translation helpful? Give feedback.
-
|
@fzyzcjy Sounds reasonable - actually on optimizers - Torch AO has a full CPU offloaded 8bit AdamW optimizer https://github.com/pytorch/ao?tab=readme-ov-file#memory-efficient-optimizers which might be interesting |
Beta Was this translation helpful? Give feedback.
-
|
@danielhanchen That looks interesting, thank you! |
Beta Was this translation helpful? Give feedback.
-
|
@danielhanchen Hi, is there any updates? I am recently interested in finetuning a llama 3.2 90B vision model using lora on 24GB card (w/ 64GB cpu memory). Wondering whether the vanilla copy layer-by-layer will fit this. |
Beta Was this translation helpful? Give feedback.
-
|
@danielhanchen I am happy to PR if this is not too much engineering work. (A most naive version may be just move tensors between cpu and gpu here and there, possibly making it async to allow overlapping communication and computation - does that look good to you?) |
Beta Was this translation helpful? Give feedback.
-
|
We now partially offload some of the layers from GPU. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I wonder whether unsloth will support some kind of CPU offload?
For example, I would like to finetune a 7-8B model on 24GB gpu. Since LoRA usually results in reduced performance, it would be great if I could do full finetune.
There seems to be some techniques about cpu offloading (e.g. DeepSpeed has some) during, let alone the commonly seen cpu offloading for inferencing. However, searching unsloth's doc does not say things about configuring some cpu offloading.
Thus I wonder, is it because it is impossible or have severe drawback (e.g. will be 100x slower), or just not-yet-implemented / on the plan? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions