adding 2 bit quantized model to mlx/local_web.py#58
adding 2 bit quantized model to mlx/local_web.py#58Goekdeniz-Guelmez wants to merge 1 commit intokyutai-labs:mainfrom
Conversation
|
Cool stuff! Would you give your agreement for us to make an official version of your 2 bit checkpoint (released as CC BY copyright Kyutai) ? We prefer to only point to official checkpoints in the code, while leaving the freedom to use to manually override with the repo flag. I'm also worried about the experience one would get, given we already notice some deterioration with the 4bits version. What's your experience ? |
|
Thanks for the quick reply! Well it was really bad, the EOS Token didn't get generated so the model just kept talking, sometimes there were gibberish outputs too. But you can still make an official version if you want it. I, Goekdeniz-Guelmez, give my agreement for Kyutai to make an official version of my 2 bit checkpoint. Here is its huggingface link |
|
Any works on improving the latency with mlx? I have tried with my M1 16Gb, extreme slow even with q4. |
|
Due to the limited quality you mention I don't think it makes sense to merge this PR. Thanks a lot for your investigation nonetheless! |
|
No Problem, it turned out that this is really good for research, and analysing purposes. |
Checklist
cargo check,cargo clippy,cargo test.PR Description
Added the 2 bit quantized mlx model because the 4-bit one doesn't work on my 8GB Mac