[Bugfix] Add int8 torch dtype for KVCache #15260

shen-shanshan · 2025-03-21T02:07:47Z

Some attention backend requires int8 kvcache dtype (e.g., quantization). It is used in initialization of CacheConfig:

if cache_config.cache_dtype == "auto":
    self.dtype = model_config.dtype
else:
    self.dtype = STR_DTYPE_TO_TORCH_DTYPE[cache_config.cache_dtype]

But there are no int8 dtype in STR_DTYPE_TO_TORCH_DTYPE:

STR_DTYPE_TO_TORCH_DTYPE = {
    "half": torch.half,
    "bfloat16": torch.bfloat16,
    "float": torch.float,
    "fp8": torch.uint8,
    "fp8_e4m3": torch.uint8,
    "fp8_e5m2": torch.uint8,
}

So, I think maybe it's better to add int8 into this STR_DTYPE_TO_TORCH_DTYPE.

Signed-off-by: shen-shanshan <[email protected]>

github-actions · 2025-03-21T02:07:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad

Have we ever tested any thing with int8 KV cache?

Is adding an item to a map enough? I am wondering how int8 KV works here.

Isotr0py · 2025-03-21T07:28:14Z

Have we ever tested any thing with int8 KV cache?
Is adding an item to a map enough? I am wondering how int8 KV works here.

I think no quantization in main repo support int8 kv cache currently, but some OOT hardware like vllm-ascend indeed can support int8 kv_cache: https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/quantization/quant_config.py#L60-L96

shen-shanshan · 2025-03-21T07:30:55Z

Have we ever tested any thing with int8 KV cache?
Is adding an item to a map enough? I am wondering how int8 KV works here.

I think no quantization in main repo support int8 kv cache currently, but some OOT hardware like vllm-ascend indeed can support int8 kv_cache: https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/quantization/quant_config.py#L60-L96

Yes, thanks for your explaination~

Signed-off-by: shen-shanshan <[email protected]>

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: shen-shanshan <[email protected]>

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Mu Huai <[email protected]>

add new torch dtype for kv_cache

a02337e

Signed-off-by: shen-shanshan <[email protected]>

Isotr0py approved these changes Mar 21, 2025

View reviewed changes

Isotr0py changed the title ~~[Bugfix] Add new torch dtype for KVCache~~ [Bugfix] Add int8 torch dtype for KVCache Mar 21, 2025

Isotr0py enabled auto-merge (squash) March 21, 2025 07:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 21, 2025

houseroad reviewed Mar 21, 2025

View reviewed changes

Isotr0py merged commit a989ca2 into vllm-project:main Mar 21, 2025
43 checks passed

shen-shanshan mentioned this pull request Mar 21, 2025

[Bugfix][Worker] Add Custom Cache Engine for NPU Worker to avoid patch vllm-project/vllm-ascend#356

Closed

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

698f488

Signed-off-by: shen-shanshan <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

3c5cf68

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

e5463a3

Signed-off-by: shen-shanshan <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

c4dc6fc

Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Add int8 torch dtype for KVCache #15260

[Bugfix] Add int8 torch dtype for KVCache #15260

Uh oh!

shen-shanshan commented Mar 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

houseroad left a comment

Uh oh!

Isotr0py commented Mar 21, 2025

Uh oh!

shen-shanshan commented Mar 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Add int8 torch dtype for KVCache #15260

[Bugfix] Add int8 torch dtype for KVCache #15260

Uh oh!

Conversation

shen-shanshan commented Mar 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py commented Mar 21, 2025

Uh oh!

shen-shanshan commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shen-shanshan commented Mar 21, 2025 •

edited by github-actions bot

Loading

shen-shanshan commented Mar 21, 2025 •

edited

Loading