Skip to content

Conversation

@alekseyfa
Copy link
Contributor

@alekseyfa alekseyfa commented Jul 10, 2025

What does this PR do?

This request is a transition to TRL version 0.17.0 from 0.9.6, which implies an update of the algorithms already existing in optimum-habana algorithms: SFT, DPO and PPO.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@yafshar
Copy link
Contributor

yafshar commented Jul 10, 2025

@alekseyfa please check the other PR #2088 as well

@imangohari1
Copy link
Contributor

Hi @alekseyfa
please make sure we rebase this PR with OH main and test it with latest diffuser update. Thanks.

@yafshar
Copy link
Contributor

yafshar commented Jul 28, 2025

@alekseyfa what else is missing from this draft to finish it?

@alekseyfa
Copy link
Contributor Author

@alekseyfa what else is missing from this draft to finish it?

The PPO needs to be finished; it is not functioning properly at the moment. However, reviewing of SFT and DPO can be started to speed up the process

@pbielak
Copy link
Collaborator

pbielak commented Sep 8, 2025

@alekseyfa Could you please provide an update, i.e., do you plan to continue the implementation of this PR / do you have an ETA when it will be finished?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants