-
Notifications
You must be signed in to change notification settings - Fork 267
Enable trl GRPO trainer #2088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable trl GRPO trainer #2088
Conversation
Co-authored-by: regisss <[email protected]>
882b81b to
a65a9d6
Compare
regisss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. Can you also add some tests for this new trainer in https://github.com/huggingface/optimum-habana/blob/main/tests/test_trl.py please?
4c1597f to
df5f215
Compare
|
The code quality check failed, please run |
df5f215 to
6b347df
Compare
|
The code quality check failed, please run |
6b347df to
86dcc6a
Compare
@regisss thanks for the review! i will add some tests soon. |
Right now it is added at the level of the makefile: Line 20 in ee0dcd6
Since CI tests are executed through makefile commands, that's everything we need. However, if you want to run it with pytest, you'll have to pass the env variable explicitly. Most tests require lazy mode so I think you can just add yours and that should be fine. |
961d0ac to
3c0a6a6
Compare
|
The code quality check failed, please run |
d6c11d6 to
7962be0
Compare
regisss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Signed-off-by: U. Artie Eoff <[email protected]> Signed-off-by: Vivek Kumar <[email protected]> Signed-off-by: Urszula <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Jimin Ha <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: Bhargav <[email protected]> Co-authored-by: Chetan Kumar Verma <[email protected]> Co-authored-by: Sheng Yang <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Edward Mascarenhas <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Silvia Colabrese <[email protected]> Co-authored-by: Mieszko Dziadowiec <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Urszula Golowicz <[email protected]> Co-authored-by: Nikolay Protasov <[email protected]> Co-authored-by: U. Artie Eoff <[email protected]> Co-authored-by: Luca Calabria <[email protected]> Co-authored-by: Harshvardhan Chauhan <[email protected]> Co-authored-by: Shifani Rajabose <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Mounika Mandava <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: Alexey Fadeev <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]> Co-authored-by: Rafal Bogdanowicz <[email protected]> Co-authored-by: Rafal <[email protected]> Co-authored-by: Jan Kamiński <[email protected]> Co-authored-by: Karol Brejna <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: karol-brejna-i <[email protected]> Co-authored-by: IlyasMoutawwakil <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Signed-off-by: U. Artie Eoff <[email protected]> Signed-off-by: Vivek Kumar <[email protected]> Signed-off-by: Urszula <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Jimin Ha <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: Bhargav <[email protected]> Co-authored-by: Chetan Kumar Verma <[email protected]> Co-authored-by: Sheng Yang <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Edward Mascarenhas <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Silvia Colabrese <[email protected]> Co-authored-by: Mieszko Dziadowiec <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Urszula Golowicz <[email protected]> Co-authored-by: Nikolay Protasov <[email protected]> Co-authored-by: U. Artie Eoff <[email protected]> Co-authored-by: Luca Calabria <[email protected]> Co-authored-by: Harshvardhan Chauhan <[email protected]> Co-authored-by: Shifani Rajabose <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Mounika Mandava <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: Alexey Fadeev <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]> Co-authored-by: Rafal Bogdanowicz <[email protected]> Co-authored-by: Rafal <[email protected]> Co-authored-by: Jan Kamiński <[email protected]> Co-authored-by: Karol Brejna <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: karol-brejna-i <[email protected]> Co-authored-by: IlyasMoutawwakil <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Signed-off-by: U. Artie Eoff <[email protected]> Signed-off-by: Vivek Kumar <[email protected]> Signed-off-by: Urszula <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Jimin Ha <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: Bhargav <[email protected]> Co-authored-by: Chetan Kumar Verma <[email protected]> Co-authored-by: Sheng Yang <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Edward Mascarenhas <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Silvia Colabrese <[email protected]> Co-authored-by: Mieszko Dziadowiec <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Urszula Golowicz <[email protected]> Co-authored-by: Nikolay Protasov <[email protected]> Co-authored-by: U. Artie Eoff <[email protected]> Co-authored-by: Luca Calabria <[email protected]> Co-authored-by: Harshvardhan Chauhan <[email protected]> Co-authored-by: Shifani Rajabose <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Mounika Mandava <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: Alexey Fadeev <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]> Co-authored-by: Rafal Bogdanowicz <[email protected]> Co-authored-by: Rafal <[email protected]> Co-authored-by: Jan Kamiński <[email protected]> Co-authored-by: Karol Brejna <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: karol-brejna-i <[email protected]> Co-authored-by: IlyasMoutawwakil <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Daniel Socek <[email protected]> Signed-off-by: U. Artie Eoff <[email protected]> Signed-off-by: Vivek Kumar <[email protected]> Signed-off-by: Urszula <[email protected]> Co-authored-by: Sun Choi <[email protected]> Co-authored-by: regisss <[email protected]> Co-authored-by: Jimin Ha <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Vidya Galli <[email protected]> Co-authored-by: Iman Gohari <[email protected]> Co-authored-by: Bhargav <[email protected]> Co-authored-by: Chetan Kumar Verma <[email protected]> Co-authored-by: Sheng Yang <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Akihiro Takahashi <[email protected]> Co-authored-by: Edward Mascarenhas <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]> Co-authored-by: Daniel Socek <[email protected]> Co-authored-by: Silvia Colabrese <[email protected]> Co-authored-by: Mieszko Dziadowiec <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Urszula Golowicz <[email protected]> Co-authored-by: Nikolay Protasov <[email protected]> Co-authored-by: U. Artie Eoff <[email protected]> Co-authored-by: Luca Calabria <[email protected]> Co-authored-by: Harshvardhan Chauhan <[email protected]> Co-authored-by: Shifani Rajabose <[email protected]> Co-authored-by: Dmitry <[email protected]> Co-authored-by: Mounika Mandava <[email protected]> Co-authored-by: ZhengHongming888 <[email protected]> Co-authored-by: Alexey Fadeev <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Vivek Kumar <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]> Co-authored-by: Rafal Bogdanowicz <[email protected]> Co-authored-by: Rafal <[email protected]> Co-authored-by: Jan Kamiński <[email protected]> Co-authored-by: Karol Brejna <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: Piotr Bielak <[email protected]> Co-authored-by: karol-brejna-i <[email protected]> Co-authored-by: IlyasMoutawwakil <[email protected]>
This enables GRPO trainer.
The baseline is #1898
This reproduce the example from HF https://huggingface.co/learn/cookbook/en/fine_tuning_llm_grpo_trl on gaudi
Before submitting