Skip to content

How to use the qwen-1.5-0.5b model? #635

@zjj6084bty

Description

@zjj6084bty

When using the qwen-1.5-0.5b model, is the program used still demo_qwen_npu.cpp? However, this program requires the prefill model to be INT8 and the decoding model to be q4k, while the models provided at https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main only include qwen-1.5-0.5b-fp32.mllm and qwen-1.5-0.5b-q4_k.mllm. The INT8 model is not provided. Is it necessary to perform quantization ourselves?
在使用qwen-1.5-0.5b模型的时候所采用的程序仍是demo_qwen_npu.cpp吗?但是这个程序中要求预填充模型是INT8,解码模型是q4k,而https://huggingface.co/mllmTeam/qwen-1.5-0.5b-mllm/tree/main所提供的模型只有qwen-1.5-0.5b-fp32.mllm和qwen-1.5-0.5b-q4_k.mllm,没有提供INT8模型,是需要自己量化吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions