Skip to content

Conversation

@CuriousPanCake
Copy link
Contributor

@CuriousPanCake CuriousPanCake commented May 7, 2024

Deduce the number of KV heads and head_size from the model without relying on HF config, and set the deduced values as KV cache input dimension. Applied HW specific layout rearagement based on the current expectations from CPU and GPU preserving those deduced dimensions.

@CuriousPanCake, Please provide the status for the same set of models that was mentioned in #24336.

Yep, I've run the tests for all the models available in the testing script and that's what I've got. Some models have given a gibberish answer to the prompt.
I'll attach the logs in a second.

  • hf-internal-testing/tiny-random-BloomForCausalLM
  • hf-internal-testing/tiny-random-FalconForCausalLM
  • hf-internal-testing/tiny-random-Starcoder2ForCausalLM
  • hf-internal-testing/tiny-random-GPTJForCausalLM
  • hf-internal-testing/tiny-random-StableLmForCausalLM
  • hf-internal-testing/tiny-random-LlamaForCausalLM
  • hf-internal-testing/tiny-random-MistralForCausalLM
  • hf-internal-testing/tiny-random-MptForCausalLM
    RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386:
    Concat Parameter axis 2 out of the tensor rank range [0, 0].
  • hf-internal-testing/tiny-random-OPTForCausalLM
  • hf-internal-testing/tiny-random-PhiForCausalLM
  • hf-internal-testing/tiny-random-StableLmForCausalLM
  • facebook/opt-125m (Not a gebberish answer)
  • Qwen/Qwen1.5-7B
    ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (8192). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
  • bigcode/starcoder2-7b (Not a gebbrish answer)
  • baichuan-inc/Baichuan2-7B-Base
    ValueError: Loading baichuan-inc/Baichuan2-7B-Base requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.
  • allenai/OLMo-7B
    The same config issue
  • internlm/internlm2-7b
    The same config issue
  • stabilityai/stablelm-tuned-alpha-7b
  • EleutherAI/gpt-j-6b (Not a gebberish answer)
  • openai-community/gpt2 (Not a gebberish answer)
  • google/gemma-7b
    There's a good answer, but then some other config issue appears.
  • Deci/DeciLM-7B
    The same config issue
  • THUDM/chatglm3-6b
    The same config issue

Tickets:

@CuriousPanCake CuriousPanCake requested a review from slyalin May 7, 2024 09:45
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label May 7, 2024
@slyalin
Copy link
Contributor

slyalin commented May 7, 2024

@CuriousPanCake, Please provide the status for the same set of models that was mentioned in #24336.

@slyalin slyalin requested a review from itikhono May 8, 2024 09:31
@slyalin
Copy link
Contributor

slyalin commented May 8, 2024

Blocks ilya-lavrenov/vllm#33.

@CuriousPanCake
Copy link
Contributor Author

CuriousPanCake commented May 8, 2024

@CuriousPanCake, Please provide the status for the same set of models that was mentioned in #24336.

Yep, I've run the tests for all the models available in the testing script and that's what I've got. Some models have given a gibberish answer to the prompt.
I'll attach the logs in a second.

  • hf-internal-testing/tiny-random-BloomForCausalLM

  • hf-internal-testing/tiny-random-FalconForCausalLM

  • hf-internal-testing/tiny-random-Starcoder2ForCausalLM

  • hf-internal-testing/tiny-random-GPTJForCausalLM

  • hf-internal-testing/tiny-random-StableLmForCausalLM

  • hf-internal-testing/tiny-random-LlamaForCausalLM

  • hf-internal-testing/tiny-random-MistralForCausalLM

  • hf-internal-testing/tiny-random-MptForCausalLM
    RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386:
    Concat Parameter axis 2 out of the tensor rank range [0, 0].

  • hf-internal-testing/tiny-random-OPTForCausalLM

  • hf-internal-testing/tiny-random-PhiForCausalLM

  • hf-internal-testing/tiny-random-StableLmForCausalLM

  • facebook/opt-125m (Not a gebberish answer)

  • Qwen/Qwen1.5-7B
    ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (8192). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

  • bigcode/starcoder2-7b (Not a gebbrish answer)

  • baichuan-inc/Baichuan2-7B-Base
    ValueError: Loading baichuan-inc/Baichuan2-7B-Base requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

  • allenai/OLMo-7B
    The same config issue

  • internlm/internlm2-7b
    The same config issue

  • stabilityai/stablelm-tuned-alpha-7b

  • EleutherAI/gpt-j-6b (Not a gebberish answer)

  • openai-community/gpt2 (Not a gebberish answer)

  • google/gemma-7b
    There's a good answer, but then some other config issue appears.

  • Deci/DeciLM-7B
    The same config issue

  • THUDM/chatglm3-6b
    The same config issue

@CuriousPanCake
Copy link
Contributor Author

I'll attach the logs in a second.
page_attention_test_full.txt

Deduce the number of KV heads and head_size from the model without relying on HF config,
and set the deduced values as KV cache input dimension. Applied HW specific layout
rearagement based on the current expectations from CPU and GPU preserving those deduced dimensions.
@CuriousPanCake CuriousPanCake marked this pull request as ready for review May 8, 2024 12:32
@CuriousPanCake CuriousPanCake requested review from a team as code owners May 8, 2024 12:32
@itikhono itikhono enabled auto-merge May 9, 2024 10:34
@itikhono itikhono added this pull request to the merge queue May 9, 2024
Merged via the queue into openvinotoolkit:master with commit 7695a3b May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Core OpenVINO Core (aka ngraph) category: transformations OpenVINO Runtime library - Transformations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants