Skip to content

save_pretrained_merged doesn't merge the model #611

@neoneye

Description

@neoneye

Problem

My goal, I want to save the merged model as a GGUF file, but I'm getting various errors.

The deeper problem seems to be that merging lora+base model isn't saving a merged file.

I think I have successfully done the merging of lora+base model, around 7..14 days ago. Maybe it's something that have broken recently.

Details

My notebook google colab is based on unsloth/llama-3-8b-bnb-4bit and trained using unsloth colab notebook.

My model neoneye/base64-decode-v2-attempt12 contains the adapter_model.safetensors file. It does not contain the full merged model.

I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The push_to_hub is working.

However merging the LoRA with the base model isn't working.

if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)

This is the output from save_pretrained_merged. There are no errors.

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.17 out of 12.67 RAM for saving.

 41%|████      | 13/32 [00:01<00:01, 13.39it/s]We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:42<00:00,  1.33s/it]

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...

/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py:399: FutureWarning: The `active_adapter` method is deprecated and will be removed in a future version.
  warnings.warn(

config.json: 100%
 1.20k/1.20k [00:00<00:00, 68.7kB/s]

Unsloth: Saving model/adapter_model.bin...
Done.

The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.

/content/model# ls -la
total 172948
drwxr-xr-x 2 root root      4096 Jun  9 15:20 .
drwxr-xr-x 1 root root      4096 Jun  9 15:24 ..
-rw-r--r-- 1 root root       732 Jun  9 15:23 adapter_config.json
-rw-r--r-- 1 root root 167934026 Jun  9 15:23 adapter_model.bin
-rw-r--r-- 1 root root       172 Jun  9 15:23 generation_config.json
-rw-r--r-- 1 root root       464 Jun  9 15:23 special_tokens_map.json
-rw-r--r-- 1 root root     50614 Jun  9 15:23 tokenizer_config.json
-rw-r--r-- 1 root root   9085698 Jun  9 15:23 tokenizer.json

I'm on Google Colab with plenty of disk space.

Connected to
Python 3 Google Compute Engine backend (GPU)
RAM: 2.91 GB/12.67 GB
Disk: 29.36 GB/201.23 GB

Solution ideas

Am I correct that save_pretrained_merged should output a big merged file?

Inside save_pretrained_merged, check if the output file was generated, if there is no file then print an error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions