You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Llama-quantize: Add force-requant feature to partial-requant (ikawrakow#1313 follow-up)
Preliminary steps:
- Add --force-requant / -frq argument to force regeneration of split files whose tensor ggml_types differ from the specified quantization type
- Add -prq shortened argument for --partial-requant
- Combined with --partial-requant / -prq: skips existing matching splits, deletes and regenerates splits with mismatched tensor types
printf(" --allow-requantize: Allows requantizing tensors that have already been quantized. Warning: This can severely reduce quality compared to quantizing from 16bit or 32bit\n");
156
156
printf(" --leave-output-tensor: Will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing\n");
157
157
printf(" --pure: Disable k-quant mixtures and quantize all tensors to the same type\n");
0 commit comments