Skip to content

feat: add TurboQuant support#300

Closed
the-wondersmith wants to merge 2 commits intolmstudio-ai:mainfrom
the-wondersmith:feat/turboquant-cache
Closed

feat: add TurboQuant support#300
the-wondersmith wants to merge 2 commits intolmstudio-ai:mainfrom
the-wondersmith:feat/turboquant-cache

Conversation

@the-wondersmith
Copy link
Copy Markdown

@the-wondersmith the-wondersmith commented Apr 2, 2026

PR addresses #296, adding TurboQuant support to mlx-engine's KV cache via turboquant-mlx

Note

I am not all that familiar with mlx-engine's internals, and as such am not 100% confident this implementation is the best / "correct" way to do it. I am more than happy to amend or refactor if any maintainer has input on a better way to do it.

Important

This PR only implements support for TurboQuant, it does not enable it by default, nor does it make it available/active in the LM Studio app. So far as I can tell, changes would need to be made in the LM Studio app codebase to "close the loop" here.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@the-wondersmith
Copy link
Copy Markdown
Author


I have read the CLA Document and I hereby sign the CLA

recheck

@github-actions github-actions bot added the CLA signed Indicates that all contributors have signed label Apr 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18c7284191

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@the-wondersmith
Copy link
Copy Markdown
Author

I've just seen that mlx-lm/#1067 actually adds TQ on the mlx-lm side of things, which mlx-engine uses under the hood.

I'm going to leave the PR open for the moment, but it may be better to simply wait until that PR lands and refactor this to use mlx_lm.generate.make_turboquant_cache (which will exist at that point, but does not currently).

@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA signed Indicates that all contributors have signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant