[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation

### 📚 The doc issue

Two parts of the documentation appear to contradict each other, especially at first glance.

Here, it is explicitly stated that LoRA inference with a quantized model is **not supported**:
https://github.com/vllm-project/vllm/blob/4c0d93f4b2de241336f4732cb5799cee8fedcb52/docs/source/models/supported_models.md?plain=1#L59-L61

However, here, an example is provided for running offline inference with a quantized model and a LoRA adapter:
https://github.com/vllm-project/vllm/blob/4c0d93f4b2de241336f4732cb5799cee8fedcb52/examples/offline_inference/lora_with_quantization_inference.py#L3-L4

To resolve this confusion, it would be very helpful to clarify the following points directly (please correct me if I am mistaken):

1. **QLoRA *is* supported**, but **only for offline inference**. This means you cannot dynamically load LoRA adapters after loading the quantized base model.
2. **QLoRA *is not* supported** with the OpenAI-compatible server, even for a single LoRA-base model pair.

**Edit:**

It's easy to miss on the docs site, that `##### LORA and quantization` is a subsection of `### Transformers fallback`, that's why I was confused.

https://github.com/vllm-project/vllm/blob/4c0d93f4b2de241336f4732cb5799cee8fedcb52/docs/source/models/supported_models.md?plain=1#L43
https://github.com/vllm-project/vllm/blob/4c0d93f4b2de241336f4732cb5799cee8fedcb52/docs/source/models/supported_models.md?plain=1#L57-L59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

📚 The doc issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	##### LORA and quantization

	Both are not supported yet! Make sure to open an issue and we'll work on this together with the `transformers` team!

	This example shows how to use LoRA with different quantization techniques
	for offline inference.

Uh oh!

[Doc]: Clarify QLoRA (Quantized Model + LoRA) Support in Documentation #13179

Description

📚 The doc issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions