-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2
#14609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Qwen2` Signed-off-by: DarkLight1337 <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
…oject#14609) Signed-off-by: DarkLight1337 <[email protected]>
|
Hello, after conducting my tests, I found that the results from vllm align with those from hf only when the vllm: import torch
import torch.nn.functional as F
from vllm import LLM
from vllm.config import PoolerConfig
input_texts = ["Hello, my name is"]
model = LLM(
model="Alibaba-NLP/gte-Qwen2-7B-instruct",
task="embed",
trust_remote_code=True,
hf_overrides={"is_causal": False},
override_pooler_config=PoolerConfig.from_json(
'{"pooling_type": "LAST", "normalize": false}'
)
)
# Tokenize the input texts
outputs = model.embed(
input_texts,
use_tqdm=False
)
embeddings = torch.tensor(outputs[0].outputs.embedding).unsqueeze(0)
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings.tolist())
# [[-0.01903870701789856, -0.013255146332085133, -0.007144127041101456, -0.003470597555860877, 0.00997133832424879, ...]hf: import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
input_texts = ["Hello, my name is"]
tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)
max_length = 8192
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings.tolist())
# [[-0.019039466977119446, -0.013219481334090233, -0.00710709486156702, -0.003489241236820817, 0.009977828711271286, ...]]If I made any mistakes, please let me know. @DarkLight1337 |
|
The comment about |
|
Got it, thank you. |
…oject#14609) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…oject#14609) Signed-off-by: DarkLight1337 <[email protected]>
…oject#14609) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>
The
is_causalconfig has been flipped toFalseon the HF repo, so we need to update our docs and tests accordingly.