[Feature Request] Add Liger CE Loss

Add a new loss in the cross_entropy_loss.py file that inherits from SFT loss but calls the Liger [fused_linear_cross_entropy](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/ops/fused_linear_cross_entropy.py) loss. It will need to handle if the input is a DTensor and convert it before calling the liger loss.

Edge Case: if the model output is a tied embedding and TP sharded (DTensor). Then either we'll have to unshard and then reshard the weight every step, or throw an error for that case. (This assumes that liger losses don't work with sharded weights)

A good validation of this feature would be to see if this loss even further improves the numbers [here](https://github.com/pytorch/torchtune/blob/main/README.md#optimization-flags) over compiled linear cross entropy loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add Liger CE Loss #2692

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add Liger CE Loss #2692

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions