adding 2 bit quantized model to mlx/local_web.py by Goekdeniz-Guelmez · Pull Request #58 · kyutai-labs/moshi

Goekdeniz-Guelmez · 2024-09-18T20:25:27Z

Checklist

[x ] Read CONTRIBUTING.md, and accept the CLA by including the provided snippet. We will not accept PR without this.
Run pre-commit hook.
If you changed Rust code, run cargo check, cargo clippy, cargo test.

PR Description

Added the 2 bit quantized mlx model because the 4-bit one doesn't work on my 8GB Mac

I, Goekdeniz-Guelmez, confirm that I have read and understood the terms of the CLA of Kyutai-labs, as outlined in the repository's CONTRIBUTING.md, and I agree to be bound by these terms.

adefossez · 2024-09-18T20:49:23Z

Cool stuff! Would you give your agreement for us to make an official version of your 2 bit checkpoint (released as CC BY copyright Kyutai) ? We prefer to only point to official checkpoints in the code, while leaving the freedom to use to manually override with the repo flag.

I'm also worried about the experience one would get, given we already notice some deterioration with the 4bits version. What's your experience ?

Goekdeniz-Guelmez · 2024-09-18T21:07:31Z

Thanks for the quick reply! Well it was really bad, the EOS Token didn't get generated so the model just kept talking, sometimes there were gibberish outputs too. But you can still make an official version if you want it. I, Goekdeniz-Guelmez, give my agreement for Kyutai to make an official version of my 2 bit checkpoint. Here is its huggingface link

thhung · 2024-09-19T09:56:43Z

Any works on improving the latency with mlx? I have tried with my M1 16Gb, extreme slow even with q4.

adefossez · 2024-09-19T13:23:16Z

Due to the limited quality you mention I don't think it makes sense to merge this PR.
I've added a link to it in our FAQ though if some people are interested.
https://github.com/kyutai-labs/moshi/blob/main/FAQ.md

Thanks a lot for your investigation nonetheless!

Goekdeniz-Guelmez · 2024-09-19T15:21:46Z

No Problem, it turned out that this is really good for research, and analysing purposes.

adding 2 bit quantized model to mlx/local_web.py

0473660

Goekdeniz-Guelmez marked this pull request as ready for review September 18, 2024 21:25

adefossez closed this Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding 2 bit quantized model to mlx/local_web.py#58

adding 2 bit quantized model to mlx/local_web.py#58
Goekdeniz-Guelmez wants to merge 1 commit intokyutai-labs:mainfrom
Goekdeniz-Guelmez:main

Goekdeniz-Guelmez commented Sep 18, 2024

Uh oh!

adefossez commented Sep 18, 2024

Uh oh!

Goekdeniz-Guelmez commented Sep 18, 2024 •

edited

Loading

Uh oh!

thhung commented Sep 19, 2024

Uh oh!

adefossez commented Sep 19, 2024

Uh oh!

Goekdeniz-Guelmez commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Goekdeniz-Guelmez commented Sep 18, 2024

Checklist

PR Description

Uh oh!

adefossez commented Sep 18, 2024

Uh oh!

Goekdeniz-Guelmez commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thhung commented Sep 19, 2024

Uh oh!

adefossez commented Sep 19, 2024

Uh oh!

Goekdeniz-Guelmez commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Goekdeniz-Guelmez commented Sep 18, 2024 •

edited

Loading