SINQ quantization strategy integration (adapted for Transformers V5) #43112

ChiaraBoretti · 2026-01-05T14:35:44Z

What does this PR do?

Summary
This PR introduces an integration of the SINQ quantization method into the Hugging Face Transformers library. It follows the pattern of existing quantization integrations such as HQQ, AWQ, and others.
The goal is to enable users to configure and apply SINQ quantization directly through model configuration parameters, which can then be passed via the from_pretrained() function within the Transformers framework.

Example Usage
The pipeline then will be the following:

cfg = SinqConfig(
    nbits=4,
    group_size=64,
    tiling_mode="1D",
    method="sinq",     
    modules_to_not_convert=["lm_head"],
    device="cuda:1"
)

model_name = "Qwen/Qwen3-1.7B"
tok = AutoTokenizer.from_pretrained(model_name)

qmodel = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=cache_dir, 
    torch_dtype=torch.bfloat16,
    quantization_config=cfg,
)

Model saving and loading
Once the model has been quantized, it can be saved locally or pushed to the Hugging Face Hub.

from sinq.hf_io import patch_hf_pretrained_io
patch_hf_pretrained_io()

hf_path = "HF_hub/path"
qmodel.save_pretrained(local_path)
qmodel.push_to_hub(hf_path, safe_serialization=True)

tok  = AutoTokenizer.from_pretrained(hf_path)
qmodel = AutoModelForCausalLM.from_pretrained(hf_path, device_map="cuda:0")

Installation
All the required dependencies are installed by installing SINQ from the official github repository.

This pull request is a follow-up to #42151. I open a new PR to improve clarity and to keep this new integration adaptation separate. Please refer to this pull request and does not consider the previous one.
Fixes #42116

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @MekkCyber

…q integration

…gration

github-actions · 2026-01-05T14:36:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: sinq

SunMarc

Thanks for the work, really appreciate to see that you adapted your work to fit the v5 ! Left a bunch of comments. I see that sinq api is quite similar to hqq and it would be great if we can improve a bit the api so that the integration is simpler and easier to maintain in general ! We had a lot of issues in our hqq integration and I prefer that we don't go through that again ;) The best would be to take inspiration from other quants methods like bnb, fp8, mxfp4 or torchao. Also, I would be great to reduce the complexity of this PR by simplifying as much as possible. You will see that the other integrations are much more easy to go through which makes the maintenance and usability in the future much better. Feel free to ask any questions, I'm really eager to merge this PR !

SunMarc · 2026-01-05T14:53:39Z