-
Notifications
You must be signed in to change notification settings - Fork 272
Load INC GPTQ checkpoint & rename params #1364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
0c3da37
24c0b1f
af49001
4304d2d
1590e29
75c68d4
13469b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,6 +103,7 @@ slow_tests_diffusers: test_installs | |
|
|
||
| # Run text-generation non-regression tests | ||
| slow_tests_text_generation_example: test_installs | ||
| BUILD_CUDA_EXT=0 python -m pip install -vvv --no-build-isolation git+https://github.com/HabanaAI/AutoGPTQ.git | ||
| python -m pip install git+https://github.com/HabanaAI/[email protected] | ||
| python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s --token $(TOKEN) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -289,21 +289,11 @@ def setup_parser(parser): | |
| type=str, | ||
| help="Path to serialize const params. Const params will be held on disk memory instead of being allocated on host memory.", | ||
| ) | ||
| parser.add_argument( | ||
| "--disk_offload", | ||
| action="store_true", | ||
| help="Whether to enable device map auto. In case no space left on cpu, weights will be offloaded to disk.", | ||
| ) | ||
| parser.add_argument( | ||
| "--trust_remote_code", | ||
| action="store_true", | ||
| help="Whether to trust the execution of code from datasets/models defined on the Hub. This option should only be set to `True` for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.", | ||
| ) | ||
| parser.add_argument( | ||
| "--load_quantized_model", | ||
| action="store_true", | ||
| help="Whether to load model from hugging face checkpoint.", | ||
| ) | ||
| parser.add_argument( | ||
| "--parallel_strategy", | ||
| type=str, | ||
|
|
@@ -312,6 +302,35 @@ def setup_parser(parser): | |
| help="Run multi card with the specified parallel strategy. Choices are 'tp' for Tensor Parallel Strategy or 'none'.", | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "--run_partial_dataset", | ||
| action="store_true", | ||
| help="Run the inference with dataset for specified --n_iterations(default:5)", | ||
| ) | ||
|
Comment on lines
+305
to
+309
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this used?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not used in this PR so I don't think it should be part of it. Please open a new PR to add this new argument to the text-generation example. |
||
|
|
||
| quant_parser_group = parser.add_mutually_exclusive_group() | ||
| quant_parser_group.add_argument( | ||
| "--load_quantized_model_with_autogptq", | ||
| action="store_true", | ||
| help="Load an AutoGPTQ quantized checkpoint using AutoGPTQ.", | ||
| ) | ||
| quant_parser_group.add_argument( | ||
| "--disk_offload", | ||
| action="store_true", | ||
| help="Whether to enable device map auto. In case no space left on cpu, weights will be offloaded to disk.", | ||
| ) | ||
| quant_parser_group.add_argument( | ||
| "--load_quantized_model_with_inc", | ||
| action="store_true", | ||
| help="Load a Huggingface quantized checkpoint using INC.", | ||
| ) | ||
| quant_parser_group.add_argument( | ||
| "--quantized_inc_model_path", | ||
| type=str, | ||
| default=None, | ||
| help="Path to neural-compressor quantized model, if set, the checkpoint will be loaded.", | ||
| ) | ||
regisss marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| if args.torch_compile: | ||
|
|
@@ -324,6 +343,9 @@ def setup_parser(parser): | |
| args.flash_attention_fast_softmax = True | ||
|
|
||
| args.quant_config = os.getenv("QUANT_CONFIG", "") | ||
| if args.quant_config and args.load_quantized_model_with_autogptq: | ||
| raise RuntimeError("Setting both quant_config and load_quantized_model_with_autogptq is unsupported. ") | ||
|
|
||
| if args.quant_config == "" and args.disk_offload: | ||
| logger.warning( | ||
| "`--disk_offload` was tested only with fp8, it may not work with full precision. If error raises try to remove the --disk_offload flag." | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.