Skip to content

adding 2 bit quantized model to mlx/local_web.py#58

Closed
Goekdeniz-Guelmez wants to merge 1 commit intokyutai-labs:mainfrom
Goekdeniz-Guelmez:main
Closed

adding 2 bit quantized model to mlx/local_web.py#58
Goekdeniz-Guelmez wants to merge 1 commit intokyutai-labs:mainfrom
Goekdeniz-Guelmez:main

Conversation

@Goekdeniz-Guelmez
Copy link
Copy Markdown

Checklist

  • [x ] Read CONTRIBUTING.md, and accept the CLA by including the provided snippet. We will not accept PR without this.
  • Run pre-commit hook.
  • If you changed Rust code, run cargo check, cargo clippy, cargo test.

PR Description

Added the 2 bit quantized mlx model because the 4-bit one doesn't work on my 8GB Mac

I, Goekdeniz-Guelmez, confirm that I have read and understood the terms of the CLA of Kyutai-labs, as outlined in the repository's CONTRIBUTING.md, and I agree to be bound by these terms.

@adefossez
Copy link
Copy Markdown
Collaborator

Cool stuff! Would you give your agreement for us to make an official version of your 2 bit checkpoint (released as CC BY copyright Kyutai) ? We prefer to only point to official checkpoints in the code, while leaving the freedom to use to manually override with the repo flag.

I'm also worried about the experience one would get, given we already notice some deterioration with the 4bits version. What's your experience ?

@Goekdeniz-Guelmez
Copy link
Copy Markdown
Author

Goekdeniz-Guelmez commented Sep 18, 2024

Thanks for the quick reply! Well it was really bad, the EOS Token didn't get generated so the model just kept talking, sometimes there were gibberish outputs too. But you can still make an official version if you want it. I, Goekdeniz-Guelmez, give my agreement for Kyutai to make an official version of my 2 bit checkpoint. Here is its huggingface link

@Goekdeniz-Guelmez Goekdeniz-Guelmez marked this pull request as ready for review September 18, 2024 21:25
@thhung
Copy link
Copy Markdown

thhung commented Sep 19, 2024

Any works on improving the latency with mlx? I have tried with my M1 16Gb, extreme slow even with q4.

@adefossez
Copy link
Copy Markdown
Collaborator

Due to the limited quality you mention I don't think it makes sense to merge this PR.
I've added a link to it in our FAQ though if some people are interested.
https://github.com/kyutai-labs/moshi/blob/main/FAQ.md

Thanks a lot for your investigation nonetheless!

@adefossez adefossez closed this Sep 19, 2024
@Goekdeniz-Guelmez
Copy link
Copy Markdown
Author

No Problem, it turned out that this is really good for research, and analysing purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants