-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Enable gptqmodel #35012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable gptqmodel #35012
Changes from 2 commits
4c567b3
1d8f83e
9f44604
62cd0dd
8c88315
ef0fb56
0191322
0655960
be914ea
aa9a5c6
a4bc251
9ae979b
a73a8c2
c18a5f1
27ac615
d3ad24b
3972d2e
2612dd7
99b2ed7
ac14b9f
0276854
8bde513
4ffc7d1
5474f89
f9e7e45
99b5f14
331b56a
409f6a2
c996a41
84e972c
dbf68e8
f4c2ad3
9185f8b
8d69ba4
226953a
65ee44b
34d0ec0
9d71301
153121a
b270b2d
7120899
a7fcfd7
8e36a0e
0aef2df
31a6baa
d7c8890
db33fd5
945f663
fc7b971
6cb77d5
2234122
d07ed96
a20dfd3
91d12cc
1ec6fe7
7d2b708
8c2a8b3
053e0ad
d3bfbb0
1d883ec
2806f71
25169bd
5ea104a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,7 +22,7 @@ | |
| if TYPE_CHECKING: | ||
| from ..modeling_utils import PreTrainedModel | ||
|
|
||
| from ..utils import is_auto_gptq_available, is_optimum_available, is_torch_available, logging | ||
| from ..utils import is_auto_gptq_available, is_gptqmodel_available, is_optimum_available, is_torch_available, logging | ||
| from ..utils.quantization_config import GPTQConfig, QuantizationConfigMixin | ||
|
|
||
|
|
||
|
|
@@ -35,11 +35,11 @@ | |
| class GptqHfQuantizer(HfQuantizer): | ||
| """ | ||
| Quantizer of the GPTQ method - for GPTQ the quantizer support calibration of the model through | ||
| `auto_gptq` package. Quantization is done under the hood for users if they load a non-prequantized model. | ||
| `auto_gptq` or `gptqmodel` package. Quantization is done under the hood for users if they load a non-prequantized model. | ||
| """ | ||
|
|
||
| requires_calibration = False | ||
| required_packages = ["optimum", "auto_gptq"] | ||
| required_packages = ["optimum", "gptqmodel"] | ||
| optimum_quantizer = None | ||
|
|
||
| def __init__(self, quantization_config: QuantizationConfigMixin, **kwargs): | ||
|
|
@@ -49,16 +49,21 @@ def __init__(self, quantization_config: QuantizationConfigMixin, **kwargs): | |
| self.optimum_quantizer = GPTQQuantizer.from_dict(self.quantization_config.to_dict_optimum()) | ||
|
|
||
| def validate_environment(self, *args, **kwargs): | ||
| gptq_supports_cpu = version.parse(importlib.metadata.version("auto-gptq")) > version.parse("0.4.2") | ||
| gptq_supports_cpu = ( | ||
| is_auto_gptq_available() | ||
| and version.parse(importlib.metadata.version("auto-gptq")) > version.parse("0.4.2") | ||
| ) or is_gptqmodel_available() | ||
| if not gptq_supports_cpu and not torch.cuda.is_available(): | ||
| raise RuntimeError("GPU is required to quantize or run quantize model.") | ||
| elif not (is_optimum_available() and is_auto_gptq_available()): | ||
| elif not (is_optimum_available() and (is_auto_gptq_available() or is_gptqmodel_available())): | ||
| raise ImportError( | ||
| "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq library (`pip install auto-gptq`)" | ||
| "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq or gptqmodel library (`pip install auto-gptq` or `pip install gptqmodel`)" | ||
| ) | ||
| elif version.parse(importlib.metadata.version("auto_gptq")) < version.parse("0.4.2"): | ||
| elif is_auto_gptq_available() and version.parse(importlib.metadata.version("auto_gptq")) < version.parse( | ||
| "0.4.2" | ||
| ): | ||
| raise ImportError( | ||
| "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq`" | ||
| "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq` or use gptqmodel by `pip install gptqmodel`" | ||
| ) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a message mentioning that autogptq will be deprecated ? I think we can do two version of transformers from now. For optimum, maybe we can deprecate this a bit later than transformers to make sure that we can still revert if there is a big issue.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't forget that the users need to use the latest version from optimum with gptqmodel.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have limited the optimum and gptqmodel version. The version limitation can be changed after gptqmodel and optimum released. |
||
|
|
||
| def update_torch_dtype(self, torch_dtype: "torch.dtype") -> "torch.dtype": | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.