Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
| [ChatGLM2](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/config/chatglm2) | THUDM/chatglm2-6b |
| [ChatGLM3](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/config/chatglm2) | THUDM/chatglm3-6b |
| [DeepSeekV2](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/config/deepseek-v2) | deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Chat, deepseek-ai/DeepSeek-V2-Lite, deepseek-ai/DeepSeek-V2-Lite-Chat, deepseek-ai/DeepSeek-Coder-V2-Base, deepseek-ai/DeepSeek-Coder-V2-Instruct, deepseek-ai/DeepSeek-Coder-V2-Lite-Base, deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct |
| [DeepSeekV3](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/config/deepseek-v2) | deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-V3-Base |
| [Gemma](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/config/gemma) | google/gemma-7b, google/gemma-7b-it, google/gemma-2b, google/gemma-2b-it |
| [Mistral](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/config/mistral) | mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-7B-v0.1 |
| [Mixtral](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/config/mixtral) | mistralai/Mixtral-8x7B-Instruct-v0.1 |
Expand Down Expand Up @@ -130,19 +131,19 @@


| Model | Pretrain | SFT | LoRA | FlashMask | Prefix Tuning | DPO/SimPO/ORPO/KTO | RLHF | Mergekit | Quantization |
|--------------------------------------------|:--------:|:---:|:----:|:---------:|:-------------:|:--------------:|:----:|:-----:|:------------:|
| [Llama](./llm/config/llama) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ |
| [Qwen](./llm/config/qwen) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚧 | | 🚧 |
| [Mixtral](./llm/config/mixtral) | ✅ | ✅ | ✅ | 🚧 | 🚧 | ✅ | 🚧 | | 🚧 |
| [Mistral](./llm/config/mistral) | ✅ | ✅ | ✅ | 🚧 | ✅ | ✅ | 🚧 | | 🚧 |
| [Baichuan/Baichuan2](./llm/config/llama) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚧 | | ✅ |
| [ChatGLM-6B](./llm/config/chatglm) | ✅ | ✅ | ✅ | 🚧 | ✅ | 🚧 | 🚧 | | ✅ |
| [ChatGLM2/ChatGLM3](./llm/config/chatglm2) | ✅ | ✅ | ✅ | 🚧 | ✅ | ✅ | 🚧 | | ✅ |
| [Bloom](./llm/config/bloom) | ✅ | ✅ | ✅ | 🚧 | ✅ | 🚧 | 🚧 | | ✅ |
| [GPT-3](./llm/config/gpt-3) | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | | 🚧 |
| [OPT](./llm/config/opt) | ✅ | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | | 🚧 |
| [Gemma](./llm/config/gemma) | ✅ | ✅ | ✅ | 🚧 | 🚧 | ✅ | 🚧 | | 🚧 |
| [Yuan](./llm/config/yuan) | ✅ | ✅ | ✅ | 🚧 | 🚧 | ✅ | 🚧 | | 🚧 |
|--------------------------------------------|:--------:|:---:|:----:|:---------:|:-------------:|:------------------:|:----:|:--------:|:------------:|
| [Llama](./llm/config/llama) | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | | ✅ |
| [Qwen](./llm/config/qwen) | ✅ | ✅ | ✅ | ✅ | ✅ | | 🚧 | | 🚧 |
| [Mixtral](./llm/config/mixtral) | ✅ | ✅ | ✅ | 🚧 | 🚧 | | 🚧 | | 🚧 |
| [Mistral](./llm/config/mistral) | ✅ | ✅ | ✅ | 🚧 | ✅ | | 🚧 | | 🚧 |
| [Baichuan/Baichuan2](./llm/config/llama) | ✅ | ✅ | ✅ | ✅ | ✅ | | 🚧 | | ✅ |
| [ChatGLM-6B](./llm/config/chatglm) | ✅ | ✅ | ✅ | 🚧 | ✅ | 🚧 | 🚧 | | ✅ |
| [ChatGLM2/ChatGLM3](./llm/config/chatglm2) | ✅ | ✅ | ✅ | 🚧 | ✅ | | 🚧 | | ✅ |
| [Bloom](./llm/config/bloom) | ✅ | ✅ | ✅ | 🚧 | ✅ | 🚧 | 🚧 | | ✅ |
| [GPT-3](./llm/config/gpt-3) | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | | 🚧 |
| [OPT](./llm/config/opt) | ✅ | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | | 🚧 |
| [Gemma](./llm/config/gemma) | ✅ | ✅ | ✅ | 🚧 | 🚧 | | 🚧 | | 🚧 |
| [Yuan](./llm/config/yuan) | ✅ | ✅ | ✅ | 🚧 | 🚧 | | 🚧 | | 🚧 |
* [大模型推理](./llm/docs/predict/inference.md)已支持 LLaMA 系列、Qwen 系列、Mistral 系列、ChatGLM 系列、Bloom 系列和 Baichuan 系列,支持 Weight Only INT8及 INT4推理,支持 WAC(权重、激活、Cache KV)进行 INT8、FP8量化的推理,【LLM】模型推理支持列表如下:

| 模型名称/量化类型支持 | FP16/BF16 | WINT8 | WINT4 | INT8-A8W8 | FP8-A8W8 | INT8-A8W8C8 |
Expand Down
2 changes: 1 addition & 1 deletion paddlenlp/generation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@ def generate(
# ['是的', '嗯嗯']
"""
if generation_config is None:
if self.generation_config._from_model_config:
if self.generation_config is None or self.generation_config._from_model_config:
new_generation_config = GenerationConfig.from_model_config(self.config)
if new_generation_config != self.generation_config:
logger.warning(
Expand Down
1 change: 1 addition & 0 deletions paddlenlp/quantization/quantization_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ def __init__(
weight_double_quant_block_size=256,
weight_quant_method="abs_max_channel_wise",
act_quant_method="abs_max",
**kwargs,
):
if weight_quantize_algo is not None and weight_quantize_algo not in [
"weight_only_int8",
Expand Down
Loading