Skip to content

Conversation

@wangxunx
Copy link
Contributor

@wangxunx wangxunx commented Oct 24, 2025

Summary

This PR fixes cross entropy loss issue when fine tune models with small vocab sizes such as Mistral v0.3 7B on AMD MI300 gpu.

Context / Motivation

  • As AMD GPU with CDNA3 arch has different hardware resource with Nvidia's gpu, the default settings of cross_entropy_loss kernel produces illegal launch parameters and cause training failure like: RuntimeError: Triton Error [HIP]: Code: 1, Messsage: invalid argument
  • For models with small vocab sizes, like mistral, we could halve num_warps to reduce resource usage and this can keep the logits return unchanged.
  • This PR reduces num warps to reasonable values, to enable mistral sft on AMD MI300, while keeping RETURN_LOGITS logic unchanged.

Changes

  • Halve num_warps for single chunk case for amd cdna arch.

Testing

  • Qwen3-14B / Mistral_0.3-7B / Llama3.2_1B_and_3B sft on MI300
  • Qwen3-14B / Mistral_0.3-7B / Llama3.2_1B_and_3B sft on RTX 4090

@danielhanchen
Copy link
Contributor

Ok thanks!

@danielhanchen danielhanchen merged commit fe9210d into unslothai:main Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants