reroute merge logic language models + comprehensive tests + eval kits #2673

rolandtannous · 2025-06-03T01:52:00Z

Solves

Applies recently merged save and merge and push to hub merged code logic (unsloth-zoo PR #158) to language models
Resolves all issues related to merging and saving language models
Resolves all issues related to index files missing for sharded language models
Resolves all issues related to merging and pushing to hub for language models
Resolves all issues related to performance of merged language models in GRPO
Fixes Applying LoRA Doesn't Change Model Output #2009, Issue with mergedmodel not reflecting Fine-Tuned LoRA weights #1763, LORA merge not working #1929, RuntimeError: Unsloth: Saving LoRA finetune failed since # of LoRAs = 196 does not match # of saved modules = 392. Please file a bug report! #2238

Problem Description

Language models were experiencing multiple critical issues with the save and merge functionality:

Merging and saving operations failing for language models
Missing index files for sharded language models causing loading failures
Push to hub merged functionality not working properly for language models
Performance degradation in merged language models
Inconsistent behavior between vision models and language models in merge operations

Solution

Extended the recently implemented save and merge logic from vision models (unsloth-zoo PR #158) to language models, ensuring consistent behavior across all model types.

Performance Benchmarking

Perplexity:

Used perplexity testing to compare model performance throughout its lifecycle. New merge logic shows measurable improvement over old merging approach:

Old Merge Logic:

Model	Base Model	Peft	Merged load 4-bit	Merged load 8-bit	Merge load-16bit
mistral-v0.3-7b	5.228433	2.759816	4.065556	2.761464	2.759856
Phi-4	4.660052	3.523139	5.448963	3.521446	4.872157
Qwen 2.5 7B instruct	8.489007	3.380023	4.562289	3.701273	3.380541
Llama-3.2-1B-Instruct	15.281656	11.005575	14.249228	11.0179	11.007604
Llama-3.1-8B-Instruxt	10.789843	7.301686	9.339204	7.299874	7.300654

New Merge Logic:

Model	Base Model	Peft	Merged load 4-bit	Merged load 8-bit	Merge load-16bit
mistral-v0.3-7b	5.228433	2.761002	2.763541	2.737924	2.736141
Phi-4	4.660052	3.519993	3.645571	3.435464	4.754354
Qwen 2.5 7B instruct	8.489007	3.380995	3.394698	3.705793	3.345016
Llama-3.2-1B-Instruct	15.281656	10.965628	11.067917	10.539675	10.550692
Llama-3.1-8B-Instruxt	10.789843	7.256952	7.435827	7.234497	7.240123

AIME eval for GRPO models

AIME 2024+2025 Evaluation Results:

Base model: 8.3% accuracy
PEFT model: 11.7% accuracy
Merged model 16 bits: 11.7% accuracy
Merged model 4 bits: 10.8% accuracy

Results confirm merged models maintain equivalent performance to PEFT models .

Testing

Perplexity Tests: Llama-3.1-8B, Llama-3.2, Phi-4, Mistral-7B, Qwen2.5-7B ✅
Push to Hub Test: Verified successful upload and retrieval of merged models ✅
Model Index Test: Confirmed proper index file generation for sharded models ✅
GRPO Performance Test: Validated using AIME evaluation benchmark ✅
OCR Evaluation: Extended vision model testing for comprehensive coverage. ✅

Evaluation Modules Added

Created reusable evaluation modules for:

Perplexity testing across model architectures
OCR evaluation for vision capabilities
AIME mathematical reasoning evaluation

Final notes

Users should use:

model.save_pretrained()  # For non-merged model saving
model.push_to_hub()      # For non-merged model hub upload

When not performing merge operations.

reroute merge logic language models + comprehensive tests + eval kits

05a11a5

danielhanchen merged commit c6b6208 into unslothai:main Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

reroute merge logic language models + comprehensive tests + eval kits #2673

reroute merge logic language models + comprehensive tests + eval kits #2673

Uh oh!

rolandtannous commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

reroute merge logic language models + comprehensive tests + eval kits #2673

reroute merge logic language models + comprehensive tests + eval kits #2673

Uh oh!

Conversation

rolandtannous commented Jun 3, 2025

Solves

Problem Description

Solution

Performance Benchmarking

Perplexity:

AIME eval for GRPO models

Testing

Evaluation Modules Added

Final notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants