Deduce the number of KV heads and head_size from the model #24400

CuriousPanCake · 2024-05-07T09:45:51Z

Deduce the number of KV heads and head_size from the model without relying on HF config, and set the deduced values as KV cache input dimension. Applied HW specific layout rearagement based on the current expectations from CPU and GPU preserving those deduced dimensions.

@CuriousPanCake, Please provide the status for the same set of models that was mentioned in #24336.

Yep, I've run the tests for all the models available in the testing script and that's what I've got. Some models have given a gibberish answer to the prompt.
I'll attach the logs in a second.

Tickets:

CVS-140707

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

slyalin · 2024-05-07T16:31:56Z

@CuriousPanCake, Please provide the status for the same set of models that was mentioned in #24336.

slyalin · 2024-05-08T09:33:12Z

Blocks ilya-lavrenov/vllm#33.

CuriousPanCake · 2024-05-08T09:46:33Z

CuriousPanCake · 2024-05-08T10:06:10Z

I'll attach the logs in a second.
page_attention_test_full.txt

Deduce the number of KV heads and head_size from the model without relying on HF config, and set the deduced values as KV cache input dimension. Applied HW specific layout rearagement based on the current expectations from CPU and GPU preserving those deduced dimensions.

CuriousPanCake requested a review from slyalin May 7, 2024 09:45

github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label May 7, 2024

CuriousPanCake commented May 7, 2024

View reviewed changes

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp Outdated Show resolved Hide resolved

CuriousPanCake commented May 7, 2024

View reviewed changes

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp Outdated Show resolved Hide resolved

CuriousPanCake commented May 7, 2024

View reviewed changes

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp Outdated Show resolved Hide resolved

ilya-lavrenov added this to the 2024.2 milestone May 7, 2024

CuriousPanCake force-pushed the deduce branch from 08ba600 to a8689f8 Compare May 7, 2024 15:37

github-actions bot added the category: Core OpenVINO Core (aka ngraph) label May 7, 2024

slyalin mentioned this pull request May 7, 2024

Use upgraded PA transformation from C++ ilya-lavrenov/vllm#33

Closed

CuriousPanCake force-pushed the deduce branch from a8689f8 to 9be70e9 Compare May 7, 2024 16:12

slyalin requested a review from itikhono May 8, 2024 09:31

CuriousPanCake force-pushed the deduce branch from 9be70e9 to f8a65d5 Compare May 8, 2024 10:29

CuriousPanCake marked this pull request as ready for review May 8, 2024 12:32

CuriousPanCake requested review from a team as code owners May 8, 2024 12:32

itikhono approved these changes May 8, 2024

View reviewed changes

itikhono enabled auto-merge May 9, 2024 10:34

itikhono added this pull request to the merge queue May 9, 2024

ilya-lavrenov assigned slyalin May 9, 2024

Merged via the queue into openvinotoolkit:master with commit 7695a3b May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deduce the number of KV heads and head_size from the model #24400

Deduce the number of KV heads and head_size from the model #24400

Uh oh!

CuriousPanCake commented May 7, 2024 •

edited by itikhono

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slyalin commented May 7, 2024

Uh oh!

slyalin commented May 8, 2024 •

edited

Loading

Uh oh!

CuriousPanCake commented May 8, 2024 •

edited

Loading

Uh oh!

CuriousPanCake commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Deduce the number of KV heads and head_size from the model #24400

Deduce the number of KV heads and head_size from the model #24400

Uh oh!

Conversation

CuriousPanCake commented May 7, 2024 • edited by itikhono Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tickets:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slyalin commented May 7, 2024

Uh oh!

slyalin commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CuriousPanCake commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CuriousPanCake commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CuriousPanCake commented May 7, 2024 •

edited by itikhono

Loading

slyalin commented May 8, 2024 •

edited

Loading

CuriousPanCake commented May 8, 2024 •

edited

Loading