[llm]support dpo pp by lugimzzz · Pull Request #9039 · PaddlePaddle/PaddleNLP

lugimzzz · 2024-08-28T12:25:53Z

PR types

New features

PR changes

APIs

Description

重构DPOTrainer与原版逐位对齐loss、metric
重构DPOTrainer能够支持pp & vpp
支持LoRA和多种DPO变体（包含KTO、DPO、ORPO、Simpo）
新增支持多个开源模型

paddle-bot · 2024-08-28T12:25:57Z

Thanks for your contribution!

codecov · 2024-08-28T12:59:14Z

Codecov Report

Attention: Patch coverage is 9.36281% with 697 lines in your changes missing coverage. Please review.

Project coverage is 53.07%. Comparing base (90cef20) to head (69ad7cf).
Report is 244 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/tensor_parallel_utils.py	5.55%	255 Missing ⚠️
paddlenlp/trl/dpo_trainer.py	5.72%	214 Missing ⚠️
paddlenlp/trl/dpo_criterion.py	9.09%	140 Missing ⚠️
paddlenlp/transformers/sequence_parallel_utils.py	17.50%	66 Missing ⚠️
paddlenlp/utils/infohub.py	26.66%	22 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9039      +/-   ##
===========================================
- Coverage    53.26%   53.07%   -0.20%     
===========================================
  Files          652      656       +4     
  Lines       105607   106095     +488     
===========================================
+ Hits         56254    56309      +55     
- Misses       49353    49786     +433

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2024-09-12T08:59:10Z

+        for key in batch.keys():
+            if key not in "response_indexs":
+                concatenated_inputs[key] = [
+                    batch[key][i * per_device_train_batch_size : (i + 1) * per_device_train_batch_size]
+                    for i in range(gradient_accumulation_steps)
+                ]
+            else:
+                concatenated_inputs["response_indexs"] = [[] for _ in range(gradient_accumulation_steps)]
+                for i in range(gradient_accumulation_steps):
+                    for response_index in batch[key]:
+                        if response_index[0] in list(
+                            range(i * per_device_train_batch_size, (i + 1) * per_device_train_batch_size)
+                        ):
+                            response_index[0] -= i * per_device_train_batch_size
+                            concatenated_inputs["response_indexs"][i].append(response_index)
+                    concatenated_inputs["response_indexs"][i] = paddle.stack(concatenated_inputs["response_indexs"][i])
+                    if model._layers.config.use_sparse_head_and_loss_fn:
+                        last_batch_response_length = concatenated_inputs["response_indexs"][i][0, 1]
+                        concatenated_inputs["response_indexs"][i][:, 1:] -= last_batch_response_length
+
+        concatenated_inputs["reference_chosen_logps"] = None
+        concatenated_inputs["reference_rejected_logps"] = None
+


建议这一大堆，封装成一个函数。

CLAassistant · 2024-09-20T06:39:57Z

All committers have signed the CLA.

ZHUI · 2024-09-20T09:49:28Z

-    get_last_checkpoint,
-    set_seed,
+from dpo_argument import (
+    DPOConfig,


DPOConfig,DPOTrainingArguments 这些看是否加到主repo？

ZHUI

LGTM

support dpo/kto pp

c653b9f

lugimzzz changed the title ~~[llm]support dpo/kto pp~~ WIP [llm]support dpo/kto pp Aug 28, 2024

note

95fb696

ZHUI reviewed Aug 30, 2024

View reviewed changes

Comment thread paddlenlp/transformers/sequence_parallel_utils.py

fix

4303fd1

lugimzzz changed the title ~~WIP [llm]support dpo/kto pp~~ WIP [llm]support dpo pp Aug 30, 2024

lugimzzz added 4 commits September 2, 2024 16:36

fix

5f27d6f

fix

34a0ee2

new

3a930df

Merge branch 'develop' of https://github.com/lugimzzz/PaddleNLP into dpo

174eda6

ZHUI reviewed Sep 12, 2024

View reviewed changes

lugimzzz added 2 commits September 12, 2024 21:55

support more model

5598973

add test

2281d66

lugimzzz changed the title ~~WIP [llm]support dpo pp~~ [llm]support dpo pp Sep 13, 2024

ZHUI reviewed Sep 18, 2024

View reviewed changes

Comment thread llm/config/qwen/AdvertiseGen/w8a8_ptq_argument.json

Comment thread llm/run_finetune.py

Comment thread paddlenlp/transformers/gemma/modeling.py

Comment thread paddlenlp/transformers/yuan/configuration.py

Merge branch 'develop' of https://github.com/lugimzzz/PaddleNLP into dpo

69ad7cf

ZHUI reviewed Sep 20, 2024

View reviewed changes

ZHUI approved these changes Sep 20, 2024

View reviewed changes

ZHUI merged commit bc55104 into PaddlePaddle:develop Sep 20, 2024

lugimzzz deleted the dpo branch December 16, 2024 11:59

Conversation

lugimzzz commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot Bot commented Aug 28, 2024

Uh oh!

codecov Bot commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZHUI Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZHUI Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lugimzzz commented Aug 28, 2024 •

edited

Loading

codecov Bot commented Aug 28, 2024 •

edited

Loading

CLAassistant commented Sep 20, 2024 •

edited

Loading