Skip to content

Conversation

@itikhono
Copy link

@itikhono itikhono commented May 1, 2024

Details:

Ported SDPA to PagedAttention transformation from python to C++ code.

the related PRs:
#24127
#24177

Tested model scope:

  • "hf-internal-testing/tiny-random-BloomForCausalLM",
  • "hf-internal-testing/tiny-random-FalconForCausalLM",
  • "hf-internal-testing/tiny-random-Starcoder2ForCausalLM",
  • "hf-internal-testing/tiny-random-GPTJForCausalLM",
  • "hf-internal-testing/tiny-random-StableLmForCausalLM",
  • "hf-internal-testing/tiny-random-LlamaForCausalLM",
  • "hf-internal-testing/tiny-random-MistralForCausalLM",
  • "hf-internal-testing/tiny-random-OPTForCausalLM",
  • "hf-internal-testing/tiny-random-PhiForCausalLM",
  • "hf-internal-testing/tiny-random-StableLmForCausalLM",
  • "facebook/opt-125m",
  • "llama2",
  • "bigcode/starcoder2-7b"
  • "mosaicml/mpt-7b-chat" (FAILED both py/c++) - acceptable for this PR
    Issue: RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386:
    Concat Parameter axis 2 out of the tensor rank range [0, 0].
  • means, that the response to the dedicated prompt is the same for the py and c++ transformations.

Tickets:

@itikhono itikhono added the category: transformations OpenVINO Runtime library - Transformations label May 1, 2024
@itikhono itikhono added this to the 2024.2 milestone May 1, 2024
@itikhono itikhono requested review from a team as code owners May 1, 2024 20:54
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: Python API OpenVINO Python bindings category: CPP API OpenVINO CPP API bindings labels May 1, 2024
@itikhono itikhono requested a review from ilya-lavrenov May 1, 2024 21:06
@itikhono
Copy link
Author

itikhono commented May 2, 2024

@ilya-lavrenov @slyalin please take a look

@slyalin
Copy link
Contributor

slyalin commented May 2, 2024

@itikhono, have you compared IRs produced by Python and C++ paths for all models from the list?

@itikhono
Copy link
Author

itikhono commented May 2, 2024

ave you compared IRs produced by Python and C++ paths for all models from the list?

We agreed to run this model list:
"hf-internal-testing/tiny-random-BloomForCausalLM",
"hf-internal-testing/tiny-random-FalconForCausalLM",
"hf-internal-testing/tiny-random-Starcoder2ForCausalLM",
"hf-internal-testing/tiny-random-GPTJForCausalLM",
"hf-internal-testing/tiny-random-StableLmForCausalLM",
"hf-internal-testing/tiny-random-LlamaForCausalLM",
"hf-internal-testing/tiny-random-MistralForCausalLM",
"hf-internal-testing/tiny-random-OPTForCausalLM",
"hf-internal-testing/tiny-random-PhiForCausalLM",
"hf-internal-testing/tiny-random-StableLmForCausalLM",
"facebook/opt-125m",
"llama2",
"bigcode/starcoder2-7b"

And compare the response generated for the dedicated prompt.
No diffs between responses when using py and c++ impls were found.

Comparing IRs is another task
We can do it but it will require more time

@itikhono
Copy link
Author

itikhono commented May 2, 2024

As I can see, Jenkins and ARM jobs fail in other PRs and in the post-commit .
Not related to these changes.

@ilya-lavrenov ilya-lavrenov merged commit 55c11c6 into openvinotoolkit:master May 2, 2024
github-merge-queue bot pushed a commit that referenced this pull request May 9, 2024
Deduce the number of KV heads and head_size from the model without
relying on HF config, and set the deduced values as KV cache input
dimension. Applied HW specific layout rearagement based on the current
expectations from CPU and GPU preserving those deduced dimensions.

> @CuriousPanCake, Please provide the status for the same set of models
that was mentioned in #24336.

Yep, I've run the tests for all the models available in the testing
script and that's what I've got. Some models have given a gibberish
answer to the prompt.
I'll attach the logs in a second.

- [x] hf-internal-testing/tiny-random-BloomForCausalLM
- [x] hf-internal-testing/tiny-random-FalconForCausalLM
- [x] hf-internal-testing/tiny-random-Starcoder2ForCausalLM
- [x] hf-internal-testing/tiny-random-GPTJForCausalLM
- [x] hf-internal-testing/tiny-random-StableLmForCausalLM
- [x] hf-internal-testing/tiny-random-LlamaForCausalLM
- [x] hf-internal-testing/tiny-random-MistralForCausalLM
- [ ] hf-internal-testing/tiny-random-MptForCausalLM 
_RuntimeError: Check '(axis_range_min <= axis) && (axis <=
axis_range_max)' failed at src/core/src/validation_util.cpp:386:
Concat Parameter axis 2 out of the tensor rank range [0, 0]._
- [x] hf-internal-testing/tiny-random-OPTForCausalLM
- [x] hf-internal-testing/tiny-random-PhiForCausalLM
- [x] hf-internal-testing/tiny-random-StableLmForCausalLM
- [x] facebook/opt-125m (Not a gebberish answer)
- [ ] Qwen/Qwen1.5-7B
_ValueError: The model's max seq len (32768) is larger than the maximum
number of tokens that can be stored in KV cache (8192). Try increasing
`gpu_memory_utilization` or decreasing `max_model_len` when initializing
the engine._
- [x] bigcode/starcoder2-7b  (Not a gebbrish answer)
- [ ] baichuan-inc/Baichuan2-7B-Base
_ValueError: Loading baichuan-inc/Baichuan2-7B-Base requires you to
execute the configuration file in that repo on your local machine. Make
sure you have read the code there to avoid malicious use, then set the
option `trust_remote_code=True` to remove this error._
- [ ] allenai/OLMo-7B
_The same config issue_
- [ ] internlm/internlm2-7b
_The same config issue_
- [x] stabilityai/stablelm-tuned-alpha-7b
- [x] EleutherAI/gpt-j-6b (Not a gebberish answer)
- [x] openai-community/gpt2 (Not a gebberish answer)
- [ ] google/gemma-7b
_There's a good answer, but then some other config issue appears._
- [ ] Deci/DeciLM-7B
_The same config issue_
- [ ] THUDM/chatglm3-6b
_The same config issue_
 

### Tickets:
 - CVS-140707
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: Python API OpenVINO Python bindings category: transformations OpenVINO Runtime library - Transformations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants