Skip to content

[INFER][LLM] Add the AutoModel for inference mode#9416

Merged
yuanlehome merged 8 commits into
PaddlePaddle:developfrom
zeroRains:auto_model
Nov 14, 2024
Merged

[INFER][LLM] Add the AutoModel for inference mode#9416
yuanlehome merged 8 commits into
PaddlePaddle:developfrom
zeroRains:auto_model

Conversation

@zeroRains
Copy link
Copy Markdown
Contributor

@zeroRains zeroRains commented Nov 12, 2024

PR types

New features

PR changes

Others

Description

当前PaddleNLP内加载Inference Model采用的是if-else分支进行模型选择,本PR参考散网中的AutoModelForCausalLM进行AutoInferenceModelForCausalLM的实现,使其支持了Inference Model的加载。

Inference Model通过AutoInferenceModelForCausalLM加载的流程图如下:

流程图

如果对不同的Inference Model的Inference Config需要进行不同的配置,只需要在对应Inference Model的类中,重写set_inference_config这个类方法即可。对于同一个模型,不同的执行设备有不同的Inference Model,只需要重写confirm_inference_model,将替换逻辑加入即可。

比如下图中对LlamaForCausalLMInferenceModelLlamaForCausalLMBlockInferenceModelLlamaForCausalLMAvxInferenceModel三个不同的InferenceModel有不同的Inference参数只用重写对应类的方法。

继承关系

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Nov 12, 2024

Thanks for your contribution!

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 13.09524% with 73 lines in your changes missing coverage. Please review.

Project coverage is 52.81%. Comparing base (b5e3f0c) to head (daa2b44).
Report is 5 commits behind head on develop.

Current head daa2b44 differs from pull request most recent head 92e2dc7

Please upload reports for the commit 92e2dc7 to get more accurate results.

Files with missing lines Patch % Lines
paddlenlp/transformers/model_utils.py 10.81% 33 Missing ⚠️
paddlenlp/transformers/auto/modeling.py 19.23% 21 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py 0.00% 16 Missing ⚠️
paddlenlp/transformers/chatglm_v2/modeling.py 50.00% 2 Missing ⚠️
...dlenlp/experimental/transformers/bloom/modeling.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9416      +/-   ##
===========================================
- Coverage    53.01%   52.81%   -0.20%     
===========================================
  Files          678      676       -2     
  Lines       108787   107910     -877     
===========================================
- Hits         57668    56997     -671     
+ Misses       51119    50913     -206     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanlehome yuanlehome self-requested a review November 13, 2024 01:37
Comment thread paddlenlp/transformers/auto/modeling.py Outdated
Comment on lines +837 to +839
return model_class.get_cache_kvs_shape(
config, predictor_args.batch_size, predictor_args.total_max_length
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方不要这样调用,from_pretrained的语义是返回一个Model,所以把get_cache_kvs_shape放在外面from_pretrained返回之后调用吧

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread llm/predict/predictor.py Outdated
Comment on lines +1246 to +1255
model = AutoModelForCausalLM.from_pretrained(
predictor_args.model_name_or_path,
inference_mode=True,
config=config,
predictor_args=predictor_args,
model_args=model_args,
dtype=predictor_args.dtype,
tensor_parallel_degree=tensor_parallel_degree,
tensor_parallel_rank=tensor_parallel_rank,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不复用AutoModelForCausalLM了吧,新增一个AutoInferenceModelForCausalLM,这样就不需要传inference_mode=True了,后续也更独立些

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread paddlenlp/transformers/auto/modeling.py Outdated
tensor_parallel_degree = kwargs.pop("tensor_parallel_degree", 1)
tensor_parallel_rank = kwargs.pop("tensor_parallel_rank", 0)
model_arg = kwargs.pop("model_args", None)
static_mode = predictor_args.mode == "static"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接使用predictor_args.mode == "static"来作为判断条件,不另声明一个新变量,否则感觉有些不清晰~

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面的dynamic_mode = predictor_args.mode == "dynamic"同理

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread paddlenlp/transformers/auto/modeling.py Outdated
Comment on lines +821 to +835
new_model_class = model_class.set_inference_config(
config=config,
predictor_args=predictor_args,
tensor_parallel_degree=tensor_parallel_degree,
tensor_parallel_rank=tensor_parallel_rank,
)
# detect the cpu avx or xpu
if new_model_class is not None:
model_class = getattr(import_class, f"{new_model_class}InferenceModel")
model_class.set_inference_config(
config=config,
predictor_args=predictor_args,
tensor_parallel_degree=tensor_parallel_degree,
tensor_parallel_rank=tensor_parallel_rank,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这块有些奇怪,set_inference_config可能会返回一个新的model_class,然后再调用一次set_inference_config,行为不太合理,可以看下这里有没有更流畅的方式~

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充了一个confirm_inference_model 方法,这个方法和set_inference_config一样,是PretrainModel的一个方法,然后默认返回原本的模型类,像llama这样有AVX Inference Model的就改写llama中的confirm_inference_model将检测过程加入进去即可
image

Copy link
Copy Markdown
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit 018b530 into PaddlePaddle:develop Nov 14, 2024
@zeroRains zeroRains deleted the auto_model branch November 14, 2024 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants