Skip to content

Conversation

@rolandtannous
Copy link
Collaborator

Problem

numel(), parameters(), named_parameters() give you a lower count on quantized models making it difficult to count params on 4bit quantized models. while these methods work fine on 8 bit quantized models.

Solution

4bit data is packed into torch.int8, hence number of params is divided by 2 when we quantize in 4 bits
The model parameters for 4bit quantized layers of class Linear4Bit, have 'Params4bit' as a class so we use that class to filter for 4bit quantized parameters and we double the count when processing the count for these parameters.
When 4 bit is not used the regular parameter.numel() count method is used.

This results in a more accurate parameter count.

Tests

Tested against Gemma3-4b and TinyLlama1.1B with load_in_4bit=True and load_in_8bit=True
Checked that number of params returned is the same as the number of parameter counts if we load the unquantized models with HF transformers and count the # of parameters
Screen Shot 2025-07-01 at 5 17 53 PM
Screen Shot 2025-07-01 at 4 55 40 PM

if (not trainable_only) and \
hasattr(model, "config") and \
hasattr(model.config, "quantization_config"):
approx = extract_approx_params_from_config(model.config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this should we just do s *= 2 if hasattr(model.config, quantization)....
We anyway sunm over params above in L247

Copy link
Collaborator Author

@rolandtannous rolandtannous Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only multiply the count of parameters that are of type Params4bit by two, not the count of all the model parameters by Two. Not all model parameters are of type Param4bit in a quantized model. You can see which by printing the model and looking for layers of type Linear4Bit

So in Gemma3-4B the exact number of parameters is 4,338,577,264 of which 1,360,527,360 are 4 bit quantized and of type Params4bit and should be multiplied by 2, while the remainder non quantized parameters , 1,617,522,544 should be counted only once.
If you add those up: 1,617,522,544 + 2* 1,360,527,360 = 4,338,577,264 which is exactly the number of parameters in Gemma3-4b. This function/method applies this equation.

i mentioned the reason you need to do this and can't just use numel() because of how 4bit data is packed.
For more info refer to our earlier convo on discord.

Copy link
Collaborator Author

@rolandtannous rolandtannous Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for line 247, you're only counting the trainable parameters, labeled as "Trainable Parameters in the console printout, not the full count of model parameters. numel() and model.parameters() work for this case because you're counting the peft params that require grad, ie are trainable, which aren't 4bit quantized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants