-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Problem
My goal, I want to save the merged model as a GGUF file, but I'm getting various errors.
The deeper problem seems to be that merging lora+base model isn't saving a merged file.
I think I have successfully done the merging of lora+base model, around 7..14 days ago. Maybe it's something that have broken recently.
Details
My notebook google colab is based on unsloth/llama-3-8b-bnb-4bit and trained using unsloth colab notebook.
My model neoneye/base64-decode-v2-attempt12 contains the adapter_model.safetensors file. It does not contain the full merged model.
I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The push_to_hub is working.
However merging the LoRA with the base model isn't working.
if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)This is the output from save_pretrained_merged. There are no errors.
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.17 out of 12.67 RAM for saving.
41%|████ | 13/32 [00:01<00:01, 13.39it/s]We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:42<00:00, 1.33s/it]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py:399: FutureWarning: The `active_adapter` method is deprecated and will be removed in a future version.
warnings.warn(
config.json: 100%
1.20k/1.20k [00:00<00:00, 68.7kB/s]
Unsloth: Saving model/adapter_model.bin...
Done.
The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.
/content/model# ls -la
total 172948
drwxr-xr-x 2 root root 4096 Jun 9 15:20 .
drwxr-xr-x 1 root root 4096 Jun 9 15:24 ..
-rw-r--r-- 1 root root 732 Jun 9 15:23 adapter_config.json
-rw-r--r-- 1 root root 167934026 Jun 9 15:23 adapter_model.bin
-rw-r--r-- 1 root root 172 Jun 9 15:23 generation_config.json
-rw-r--r-- 1 root root 464 Jun 9 15:23 special_tokens_map.json
-rw-r--r-- 1 root root 50614 Jun 9 15:23 tokenizer_config.json
-rw-r--r-- 1 root root 9085698 Jun 9 15:23 tokenizer.json
I'm on Google Colab with plenty of disk space.
Connected to
Python 3 Google Compute Engine backend (GPU)
RAM: 2.91 GB/12.67 GB
Disk: 29.36 GB/201.23 GB
Solution ideas
Am I correct that save_pretrained_merged should output a big merged file?
Inside save_pretrained_merged, check if the output file was generated, if there is no file then print an error.