How to use the qwen-1.5-0.5b model?

When using the qwen-1.5-0.5b model, is the program used still demo_qwen_npu.cpp? However, this program requires the prefill model to be INT8 and the decoding model to be q4k, while the models provided at https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main only include qwen-1.5-0.5b-fp32.mllm and qwen-1.5-0.5b-q4_k.mllm.  The INT8 model is not provided. Is it necessary to perform quantization ourselves?
在使用qwen-1.5-0.5b模型的时候所采用的程序仍是demo_qwen_npu.cpp吗？但是这个程序中要求预填充模型是INT8，解码模型是q4k，而https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main所提供的模型只有qwen-1.5-0.5b-fp32.mllm和qwen-1.5-0.5b-q4_k.mllm，没有提供INT8模型，是需要自己量化吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the qwen-1.5-0.5b model? #635

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use the qwen-1.5-0.5b model? #635

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions