save_pretrained_merged doesn't merge the model

# Problem

My goal, I want to save the merged model as a GGUF file, but I'm getting various errors.

The deeper problem seems to be that merging lora+base model isn't saving a merged file.

I think I have successfully done the merging of lora+base model, around 7..14 days ago. Maybe it's something that have broken recently.

# Details

My notebook [google colab](https://colab.research.google.com/drive/1prkLJygOgogT418cd7AqKOZ_-cqmwBnX?usp=sharing) is based on `unsloth/llama-3-8b-bnb-4bit` and trained using unsloth colab notebook.

My model [neoneye/base64-decode-v2-attempt12](https://huggingface.co/neoneye/base64-decode-v2-attempt12) contains the `adapter_model.safetensors` file. It does not contain the full merged model.

I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The `push_to_hub` is working.

However merging the LoRA with the base model isn't working.
```python
if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
```

This is the output from `save_pretrained_merged`. There are no errors. 

```
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.17 out of 12.67 RAM for saving.

 41%|████      | 13/32 [00:01<00:01, 13.39it/s]We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:42<00:00,  1.33s/it]

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...

/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py:399: FutureWarning: The `active_adapter` method is deprecated and will be removed in a future version.
  warnings.warn(

config.json: 100%
 1.20k/1.20k [00:00<00:00, 68.7kB/s]

Unsloth: Saving model/adapter_model.bin...
Done.
```


The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.

```
/content/model# ls -la
total 172948
drwxr-xr-x 2 root root      4096 Jun  9 15:20 .
drwxr-xr-x 1 root root      4096 Jun  9 15:24 ..
-rw-r--r-- 1 root root       732 Jun  9 15:23 adapter_config.json
-rw-r--r-- 1 root root 167934026 Jun  9 15:23 adapter_model.bin
-rw-r--r-- 1 root root       172 Jun  9 15:23 generation_config.json
-rw-r--r-- 1 root root       464 Jun  9 15:23 special_tokens_map.json
-rw-r--r-- 1 root root     50614 Jun  9 15:23 tokenizer_config.json
-rw-r--r-- 1 root root   9085698 Jun  9 15:23 tokenizer.json
```

I'm on Google Colab with plenty of disk space.
```
Connected to
Python 3 Google Compute Engine backend (GPU)
RAM: 2.91 GB/12.67 GB
Disk: 29.36 GB/201.23 GB
```

# Solution ideas

Am I correct that `save_pretrained_merged` should output a big merged file?

Inside `save_pretrained_merged`, check if the output file was generated, if there is no file then print an error.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

save_pretrained_merged doesn't merge the model #611

Problem

Details

Solution ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

save_pretrained_merged doesn't merge the model #611

Description

Problem

Details

Solution ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions