-
Notifications
You must be signed in to change notification settings - Fork 169
Open
Description
When using the qwen-1.5-0.5b model, is the program used still demo_qwen_npu.cpp? However, this program requires the prefill model to be INT8 and the decoding model to be q4k, while the models provided at https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main only include qwen-1.5-0.5b-fp32.mllm and qwen-1.5-0.5b-q4_k.mllm. The INT8 model is not provided. Is it necessary to perform quantization ourselves?
在使用qwen-1.5-0.5b模型的时候所采用的程序仍是demo_qwen_npu.cpp吗?但是这个程序中要求预填充模型是INT8,解码模型是q4k,而https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main所提供的模型只有qwen-1.5-0.5b-fp32.mllm和qwen-1.5-0.5b-q4_k.mllm,没有提供INT8模型,是需要自己量化吗?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels