-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Paged attention transformation #24177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paged attention transformation #24177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically name files with snake_case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we create paged_attention subfolder to contain files related to this new meta-transformation?
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
64d760d to
5efd778
Compare
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/StateManagementPattern.cpp
Outdated
Show resolved
Hide resolved
8dd42df to
9a665e7
Compare
9a665e7 to
d6730d9
Compare
.../transformations/include/transformations/common_optimizations/TotalSequenceLengthPattern.hpp
Show resolved
Hide resolved
...n/transformations/include/transformations/common_optimizations/PrevSequenceLengthPattern.hpp
Show resolved
Hide resolved
ilya-lavrenov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt125m and llama2 work
this list of models work I will re-open this PR and resolve some comments |
|
a new PR has been opened: #24336 |
### Details: Ported SDPA to PagedAttention transformation from python to C++ code. the related PRs: #24127 #24177 Tested model scope: - [x] "hf-internal-testing/tiny-random-BloomForCausalLM", - [x] "hf-internal-testing/tiny-random-FalconForCausalLM", - [x] "hf-internal-testing/tiny-random-Starcoder2ForCausalLM", - [x] "hf-internal-testing/tiny-random-GPTJForCausalLM", - [x] "hf-internal-testing/tiny-random-StableLmForCausalLM", - [x] "hf-internal-testing/tiny-random-LlamaForCausalLM", - [x] "hf-internal-testing/tiny-random-MistralForCausalLM", - [x] "hf-internal-testing/tiny-random-OPTForCausalLM", - [x] "hf-internal-testing/tiny-random-PhiForCausalLM", - [x] "hf-internal-testing/tiny-random-StableLmForCausalLM", - [x] "facebook/opt-125m", - [x] "llama2", - [x] "bigcode/starcoder2-7b" - [ ] "mosaicml/mpt-7b-chat" (FAILED both py/c++) - acceptable for this PR Issue: RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386: Concat Parameter axis 2 out of the tensor rank range [0, 0]. - [x] _means, that the response to the dedicated prompt is the same for the py and c++ transformations._ ### Tickets: - *CVS-138664* --------- Co-authored-by: Sergey Lyalin <[email protected]> Co-authored-by: Andrii Staikov <[email protected]>
wip: added the setup part and half of the first matcher