feat: add TurboQuant support#300
Conversation
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
I have read the CLA Document and I hereby sign the CLArecheck |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18c7284191
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
I've just seen that I'm going to leave the PR open for the moment, but it may be better to simply wait until that PR lands and refactor this to use |
PR addresses #296, adding TurboQuant support to
mlx-engine's KV cache viaturboquant-mlxNote
I am not all that familiar with
mlx-engine's internals, and as such am not 100% confident this implementation is the best / "correct" way to do it. I am more than happy to amend or refactor if any maintainer has input on a better way to do it.Important
This PR only implements support for
TurboQuant, it does not enable it by default, nor does it make it available/active in the LM Studio app. So far as I can tell, changes would need to be made in the LM Studio app codebase to "close the loop" here.