Skip to content

[Embedding] Add inf-cl in embedding trainer#9673

Merged
ZHUI merged 7 commits into
PaddlePaddle:developfrom
jie-z-0607:add_inf-cl_in_embedding
Dec 25, 2024
Merged

[Embedding] Add inf-cl in embedding trainer#9673
ZHUI merged 7 commits into
PaddlePaddle:developfrom
jie-z-0607:add_inf-cl_in_embedding

Conversation

@jie-z-0607
Copy link
Copy Markdown
Contributor

PR types

Function optimization

PR changes

Others

Description

在embedding训练中增加inf_cl_loss,在超大batch_size下能有效节省显存消耗。

经测试,inf-cl算子能够与原有损失函数有效对齐:

  • 以数据类型设置bf16,group_size设置1,gradient_accumulation_steps设置4为例,inf_cl_loss与原有contrastive_loss的收敛曲线如下:
    image

经测试,在超大batch_size下,inf-cl算子能够有效降低embedding训练时的显存消耗:

  • 在8张A100(80G)显卡下,以数据类型设置bf16,group_size设置4,gradient_accumulation_steps设置4096为例,inf_cl_loss与原有contrastive_loss的显存占用对比如下:

参数设置 显存占用 首个step完成耗费时间
不使用inf-cl;embedding_negatives_cross_device=True 42238MiB;42526MiB;
42526MiB;42470MiB;
42470MiB;42526MiB;
42526MiB;42182MiB
48min42s
使用inf-cl;embedding_negatives_cross_device=Flase 29630MiB;28392MiB;
28372MiB;28308MiB;
28320MiB;28384MiB;
28316MiB;28070MiB
49min56s


  • 在8张A100(80G)显卡下,以数据类型设置bf16,group_size设置1,gradient_accumulation_steps设置16384(总计batch_size 128K)为例,inf_cl_loss与原有contrastive_loss的显存占用对比如下:

参数设置 显存占用 首个step完成耗费时间
不使用inf-cl;embedding_negatives_cross_device=True 超出显存限制  
使用inf-cl;embedding_negatives_cross_device=Flase 46324MiB;45192MiB;
44926MiB;45180MiB;
44674MiB;45022MiB;
45032MiB;44904MiB
2h23min46s



@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Dec 23, 2024

Thanks for your contribution!

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 13.95349% with 37 lines in your changes missing coverage. Please review.

Project coverage is 52.76%. Comparing base (1842d6d) to head (8f55e52).
Report is 268 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/contrastive_loss.py 18.18% 27 Missing ⚠️
paddlenlp/trl/embedding_trainer.py 0.00% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9673      +/-   ##
===========================================
- Coverage    53.18%   52.76%   -0.43%     
===========================================
  Files          718      718              
  Lines       113340   112338    -1002     
===========================================
- Hits         60282    59276    -1006     
- Misses       53058    53062       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

__all__ = ["Simple_Inf_cl_loss", "Matryoshka_Inf_cl_loss"]


class Simple_Inf_cl_loss(nn.Layer):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加一些注释

Comment thread paddlenlp/trl/embedding_trainer.py Outdated
from paddle.base import core
from paddle.distributed import fleet

from ops.src.paddlenlp_kernel.triton.inf_cl.inf_cl_loss import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ops.src.paddlenlp_kernel.triton.inf_cl.inf_cl_loss import (
from paddlenlp_kernel.triton.inf_cl.inf_cl_loss import (

Comment thread paddlenlp/trl/embedding_trainer.py Outdated
from paddle.base import core
from paddle.distributed import fleet

from ops.src.paddlenlp_kernel.triton.inf_cl.inf_cl_loss import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没有默认安装,需要 try except一下

group_size = p_reps.shape[0] // q_reps.shape[0] # Number of keys per query
labels = paddle.arange(q_reps.shape[0], dtype="int64") # Generate labels for queries
labels = labels * group_size # Adjust labels based on group size
loss = cal_inf_loss(q_reps, p_reps, labels=labels, scale=None, head_dim=self.head_dim)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你把import 的代码放到这里吧, 然后没有包的话,直接报错。

try:
    from paddlenlp_kernel.triton.inf_cl import cal_inf_loss
except ImportError:
    logger.warning(
        "Paddlenlp_kernels are not available, which means the inf_cl loss cannot be used. If you wish to use the inf_cl loss, please follow the instructions in the README.md on the `ops`."
    )

@jie-z-0607 jie-z-0607 requested a review from ZHUI December 24, 2024 09:00
@ZHUI ZHUI changed the title add inf-cl in embedding trainer [Embedding] Add inf-cl in embedding trainer Dec 25, 2024
@ZHUI ZHUI merged commit 40fa402 into PaddlePaddle:develop Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants