Skip to content

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Oct 16, 2025

Summary

Returns the mean token accuracy metric when minimizing the cross-entropy loss without materializing the logits

https://x.com/jeremyphoward/status/1703246293802586155

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@kashif
Copy link
Contributor Author

kashif commented Oct 20, 2025

@vaibhavjindal would you be able to kindly review?

@kashif
Copy link
Contributor Author

kashif commented Oct 28, 2025

@shimizust this will be a breaking change i believe BTW

@kashif kashif changed the title [Cross-entropy-loss] add return_token_accuracy flag to fused_linear_cross_entropy [Cross-entropy-loss] return mean token accuracy metric with CE loss Nov 1, 2025
@vaibhavjindal
Copy link
Collaborator

@kashif could you please elaborate on how it will be a breaking change? Will it break the intergration with transformers or trl?

@kashif
Copy link
Contributor Author

kashif commented Nov 3, 2025

yes if someone is using the raw functions in their lib. then now that functions returns one more thing... but on the HF side this PR takes care of this

@kashif
Copy link
Contributor Author

kashif commented Nov 3, 2025

@vaibhavjindal
Copy link
Collaborator

@kashif got it. So if i understand correctly, it will make sure that liger remains compatible with newer versions from HF. However, just want to confirm it will break liger support with older transformers/trl versions?

@kashif
Copy link
Contributor Author

kashif commented Nov 3, 2025

no i believe my changes here will work with older version of HF.. i just meant non-HF frameworks

@kashif
Copy link
Contributor Author

kashif commented Nov 3, 2025

TRL relies on HF integration for the CE loss so in TRL I will just pin to the liger version that has these changes

@kashif
Copy link
Contributor Author

kashif commented Nov 5, 2025

@vaibhavjindal let me fix up the new qwen3-vl model to update its API

@kashif
Copy link
Contributor Author

kashif commented Nov 5, 2025

@vaibhavjindal all good from my side

@vaibhavjindal
Copy link
Collaborator

@vaibhavjindal all good from my side

Thanks a lot! I will do some final checks on correctness and benchmarks and will try to get it merged soon.

@kashif
Copy link
Contributor Author

kashif commented Nov 5, 2025

thank you so much.. also see here: huggingface/trl#4302 (comment)

@vaibhavjindal vaibhavjindal merged commit 7dd8ecc into linkedin:main Nov 5, 2025
3 of 7 checks passed
@kashif
Copy link
Contributor Author

kashif commented Nov 6, 2025

thanks @vaibhavjindal for the typo fix and making it more robust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants