[INFER][LLM] Add the AutoModel for inference mode by zeroRains · Pull Request #9416 · PaddlePaddle/PaddleNLP

zeroRains · 2024-11-12T13:47:57Z

PR types

New features

PR changes

Others

Description

当前PaddleNLP内加载Inference Model采用的是if-else分支进行模型选择，本PR参考散网中的AutoModelForCausalLM进行AutoInferenceModelForCausalLM的实现，使其支持了Inference Model的加载。

Inference Model通过AutoInferenceModelForCausalLM加载的流程图如下：

如果对不同的Inference Model的Inference Config需要进行不同的配置，只需要在对应Inference Model的类中，重写set_inference_config这个类方法即可。对于同一个模型，不同的执行设备有不同的Inference Model，只需要重写confirm_inference_model，将替换逻辑加入即可。

比如下图中对LlamaForCausalLMInferenceModel，LlamaForCausalLMBlockInferenceModel，LlamaForCausalLMAvxInferenceModel三个不同的InferenceModel有不同的Inference参数只用重写对应类的方法。

paddle-bot · 2024-11-12T13:48:03Z

Thanks for your contribution!

codecov · 2024-11-12T14:21:10Z

Codecov Report

Attention: Patch coverage is 13.09524% with 73 lines in your changes missing coverage. Please review.

Project coverage is 52.81%. Comparing base (b5e3f0c) to head (daa2b44).
Report is 5 commits behind head on develop.

❗ Current head daa2b44 differs from pull request most recent head 92e2dc7

Please upload reports for the commit 92e2dc7 to get more accurate results.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/model_utils.py	10.81%	33 Missing ⚠️
paddlenlp/transformers/auto/modeling.py	19.23%	21 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	16 Missing ⚠️
paddlenlp/transformers/chatglm_v2/modeling.py	50.00%	2 Missing ⚠️
...dlenlp/experimental/transformers/bloom/modeling.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9416      +/-   ##
===========================================
- Coverage    53.01%   52.81%   -0.20%     
===========================================
  Files          678      676       -2     
  Lines       108787   107910     -877     
===========================================
- Hits         57668    56997     -671     
+ Misses       51119    50913     -206

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2024-11-13T12:10:25Z

+                return model_class.get_cache_kvs_shape(
+                    config, predictor_args.batch_size, predictor_args.total_max_length
+                )


这个地方不要这样调用，from_pretrained的语义是返回一个Model，所以把get_cache_kvs_shape放在外面from_pretrained返回之后调用吧

yuanlehome · 2024-11-13T12:32:54Z

+            model = AutoModelForCausalLM.from_pretrained(
+                predictor_args.model_name_or_path,
+                inference_mode=True,
+                config=config,
+                predictor_args=predictor_args,
+                model_args=model_args,
+                dtype=predictor_args.dtype,
+                tensor_parallel_degree=tensor_parallel_degree,
+                tensor_parallel_rank=tensor_parallel_rank,
+            )


不复用AutoModelForCausalLM了吧，新增一个AutoInferenceModelForCausalLM，这样就不需要传inference_mode=True了，后续也更独立些

yuanlehome · 2024-11-13T12:34:49Z

+            tensor_parallel_degree = kwargs.pop("tensor_parallel_degree", 1)
+            tensor_parallel_rank = kwargs.pop("tensor_parallel_rank", 0)
+            model_arg = kwargs.pop("model_args", None)
+            static_mode = predictor_args.mode == "static"


直接使用predictor_args.mode == "static"来作为判断条件，不另声明一个新变量，否则感觉有些不清晰～

下面的dynamic_mode = predictor_args.mode == "dynamic"同理

yuanlehome · 2024-11-13T12:44:45Z

+            new_model_class = model_class.set_inference_config(
+                config=config,
+                predictor_args=predictor_args,
+                tensor_parallel_degree=tensor_parallel_degree,
+                tensor_parallel_rank=tensor_parallel_rank,
+            )
+            # detect the cpu avx or xpu
+            if new_model_class is not None:
+                model_class = getattr(import_class, f"{new_model_class}InferenceModel")
+                model_class.set_inference_config(
+                    config=config,
+                    predictor_args=predictor_args,
+                    tensor_parallel_degree=tensor_parallel_degree,
+                    tensor_parallel_rank=tensor_parallel_rank,
+                )


感觉这块有些奇怪，set_inference_config可能会返回一个新的model_class，然后再调用一次set_inference_config，行为不太合理，可以看下这里有没有更流畅的方式～

补充了一个confirm_inference_model 方法，这个方法和set_inference_config一样，是PretrainModel的一个方法，然后默认返回原本的模型类，像llama这样有AVX Inference Model的就改写llama中的confirm_inference_model将检测过程加入进去即可

yuanlehome

LGTM

add the AutoModel for inference mode in dynamic graph

419a31a

paddle-bot Bot added the contributor label Nov 12, 2024

paddle-bot Bot assigned wawltor Nov 12, 2024

yuanlehome self-requested a review November 13, 2024 01:37

add the AutoModel for inference mode in static graph

89dfe29

yuanlehome reviewed Nov 13, 2024

View reviewed changes

zeroRains added 6 commits November 13, 2024 13:15

create AutoInferenceModelForCausalLM class and polish the code

ce05dcb

fix the return result

f0de988

add the confirm_inference_model method

789b94b

roback the AutoModel

daa2b44

modify the description

b154744

update

92e2dc7

yuanlehome requested review from DrownFish19 and ZHUI November 14, 2024 05:23

yuanlehome approved these changes Nov 14, 2024

View reviewed changes

yuanlehome merged commit 018b530 into PaddlePaddle:develop Nov 14, 2024

zeroRains deleted the auto_model branch November 14, 2024 07:15

zeroRains mentioned this pull request Jan 14, 2025

2024下半年飞桨开源之星评选-信息征集 PaddlePaddle/community#1043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INFER][LLM] Add the AutoModel for inference mode#9416

[INFER][LLM] Add the AutoModel for inference mode#9416
yuanlehome merged 8 commits into
PaddlePaddle:developfrom
zeroRains:auto_model

zeroRains commented Nov 12, 2024 •

edited

Loading

Uh oh!

paddle-bot Bot commented Nov 12, 2024

Uh oh!

codecov Bot commented Nov 12, 2024 •

edited

Loading

Uh oh!

yuanlehome Nov 13, 2024

Uh oh!

zeroRains Nov 13, 2024

Uh oh!

yuanlehome Nov 13, 2024

Uh oh!

zeroRains Nov 13, 2024

Uh oh!

yuanlehome Nov 13, 2024

Uh oh!

yuanlehome Nov 13, 2024

Uh oh!

zeroRains Nov 13, 2024

Uh oh!

yuanlehome Nov 13, 2024

Uh oh!

zeroRains Nov 14, 2024

Uh oh!

yuanlehome left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zeroRains commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot Bot commented Nov 12, 2024

Uh oh!

codecov Bot commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeroRains commented Nov 12, 2024 •

edited

Loading

codecov Bot commented Nov 12, 2024 •

edited

Loading