Replies: 3 comments 3 replies
-
|
This is very likely the cause of your issue: sequence_len: 128000Is this how long your sequences are? If so, could you look into Sequence Parallelism? https://docs.axolotl.ai/docs/sequence_parallelism.html Side note: were you also the one who asked in Discord? |
Beta Was this translation helpful? Give feedback.
-
|
Hey there, thanks for replying! I indeed need to use that sequence length in order to prevent dropping sequences which are too long; however, I also tried reducing the sequence length to 32k, but I am still facing crashes at around 40% of the training run. The spikes in VRAM usage appear to be happening quite randomly in the middle of training - do you maybe know why that might be happening right in the middle? Regarding your question, no, that wasn't me... Edit: I am also now loading the model in 8 bit only, but it still happens. |
Beta Was this translation helpful? Give feedback.
-
|
I’m trying to fine-tune the model unsloth/Mistral-Small-3.2-24B-Instruct-2506 for tool calling, but I haven’t been able to find any documentation or guides explaining how to do this. This will be my first time fine-tuning a model, so I’d really appreciate any guidance, resources, or examples to help me get started. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello to y'all,
I am currently trying to fine-tune Mistral Small 3.2 Instruct with LoRA (r = 32 and alpha = 32) on a small 3k rows dataset @ BF16. However, I frequently get OOM errors when increasing the context length too much, even when using multi-GPU (2x H200 SXM with 280 GB VRAM total). The errors come right after the first evaluation run has finished and training starts. I have roughly 180M trainable parameters...
Am I missing something obvious?
Here is my config:
I'd greatly appreciate any help! Thanks in advance. :)
Beta Was this translation helpful? Give feedback.
All reactions