GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel #2943

rolandtannous · 2025-07-12T17:40:13Z

Problem

The current implementation of fast_dequantize and fast_gemv kernels assumes that quantization statistics (absmax values) always need to be dequantized at inference time. However, recent versions of vLLM have introduced a _dequantize_dq optimization method that pre-processes double quantization during model loading rather than at inference time. This optimization trades memory for compute performance by dequantizing the scaling statistics ahead of time.

quant_state.nested becomes False
quant_state.state2 becomes None
quant_state.offset becomes None

As a consequence, and when loading a model using unsloth with fast_inference=True, this triggers the application of dequantize_dq . During training and as the GRPOTrainer is called on the the model, the existing unsloth fast_dequantize and fast_gemv kernels which were not initially written to handle this edge case, will always attempt to access state2.absmax, state2.code, etc., leading to

AttributeError: 'NoneType' object has no attribute 'absmax'

Solution

Modified both fast_dequantize and fast_gemv kernels across all device types (XPU, CUDA, fallback) to:

Check for pre-dequantized state: Added logic to detect when double quantization has already been resolved

For object-based quant_state: Check hasattr(quant_state, 'nested') and quant_state.nested and state2 is not None
For list-based quant_state: Check state2 is not None

Conditional statistics dequantization: Only perform cdequantize_blockwise_fp32 when needed

When has_nested_quant=True: Dequantize statistics using state2 parameters
When has_nested_quant=False: Use pre-dequantized absmax directly

Consistent buffer handling: Ensure out_absmax buffer is properly populated in both cases for fast_dequantize
Safe pointer management: Define ptr_out_absmax before conditional blocks to avoid scope issues

This maintains backward compatibility while supporting the performance optimization provided by dequantize_dq.

Solves

GRPO notebooks not working because of dequantization related problems:
trainer.train() AttributeError: 'NoneType' object has no attribute 'absmax' #863
[Bug] AttributeError: 'NoneType' object has no attribute 'absmax' when training llama-3.1-8b with GRPOTrainer #2910
https://discord.com/channels/1179035537009545276/1392924989589815438
https://discord.com/channels/1179035537009545276/1179777624986357780/1391969670805852221

Reproducible code

from unsloth import FastLanguageModel
import torch
max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "meta-llama/meta-Llama-3.1-8B-Instruct",
    model_name = "unsloth/Meta-Llama-3.1-8B-Instruct",
    max_seq_length = max_seq_length,
    #load_in_4bit = True, # False for LoRA 16bit
    load_in_4bit=True,
    #use_gradient_checkpointing="unsloth",
    #load_in_8bit=True,
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.6, # Reduce if out of memory
)

quant_state=getattr(model.model.layers[0].self_attn.q_proj.weight,"quant_state", None)
print(type(quant_state))
print(quant_state.nested)
print(type(quant_state.state2))

Tests

We tested end-to-end the following GRPO notebooks to ensure both training and inference work correctly.
After applying the fixes in both #2944 and this PR, all notebooks now complete successfully without errors

Notebook	Training	Inference
Advanced Llama-3.1-(3B)-GRPO-Lora	✅	✅
Advanced_Llama3_2_(3B)_GRPO_LoRA	✅	✅
Phi-14B-GRPO	✅	✅
MMistral_v0.3_(7B)-GRPO	✅	✅
qwen3_4b-GRPO	✅	✅

Additional notes:

After we resolved this issue , we faced another ValueError related to dataloader_num_workers . We issued a fix for that in PR 2944

…n fast_dequantize kernel (#2943)" This reverts commit 0eb61fb.

Support pre-dequantized quantization states in fast_dequantize kernel

a23313c

rolandtannous changed the title ~~Support pre-dequantized quantization states in fast_dequantize kernel~~ GRPO Fix - Support pre-dequantized quantization states in fast_dequantize kernel Jul 12, 2025

rolandtannous changed the title ~~GRPO Fix - Support pre-dequantized quantization states in fast_dequantize kernel~~ GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel Jul 12, 2025

rolandtannous mentioned this pull request Jul 12, 2025

GRPO fix dataloader_num_workers value error in GRPOTrainer #2944

Merged

rolandtannous and others added 3 commits July 14, 2025 09:07

has_nested_quant conditional set to only

970b26c

Update utils.py

b01c4ea

Update utils.py

0a4b635

danielhanchen merged commit 0eb61fb into unslothai:main Jul 14, 2025

danielhanchen added a commit that referenced this pull request Jul 17, 2025

Revert "GRPO Fix - Support vllm pre-dequantized quantization states i…

f6a9dc4

…n fast_dequantize kernel (#2943)" This reverts commit 0eb61fb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel #2943

GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel #2943

Uh oh!

rolandtannous commented Jul 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel #2943

GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel #2943

Uh oh!

Conversation

rolandtannous commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Solves

Reproducible code

Tests

Additional notes:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rolandtannous commented Jul 12, 2025 •

edited

Loading