After fine-tuning Llama-3 8B 4-bit the results are okay in colab (getting the LLM output can see the results are from custom dataset). This is before merging, Inferencing directly in the below step of Llama-3 unsloth template

But when push the model using the below step by using 16bit to hub, the results are widely varying. I have tried to do 4bit as well

I don't think it's to do with temperature or anything. I feel I am doing something wrong, or the merge is not happening properly as the results are completely different. No information from fine-tuning dataset is reflected. Anyone has experienced this and any potential solutions? How to validate if adapters are merged? Thanks in advance