Skip to content

Conversation

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Mar 14, 2025

SUMMARY:

  • enable tests

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

tokenizer = tokenizer_group.get_lora_tokenizer(None)
self.vocab_size = tokenizer.max_token_id + 1
self.mask_size = max(tokenizer.max_token_id,
self.vllm_config.model_config.get_vocab_size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm @njhill FYI -- This is where I finally landed that makes our tests pass for both Qwen and Mistral

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'm definitely not confident in this ... I don't have a complete enough understanding of the factors involved here to know we're always getting the right value ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the +1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case that broke for me, get_vocab_size() returned the same as "max_token_id + 1".

There is another case, tested in CI, that needed the value "max_token_id" to work.

I'll probably run out of time today before I can get to the bottom of this properly...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i think we should use len(tokenizer.get_vocab()) to calculate it eagerly

#14851

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased on that and it’s still failing

@russellb russellb added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2025
self._grammar_bitmask = xgr.allocate_token_bitmask(
self.vllm_config.scheduler_config.max_num_seqs,
self.vocab_size,
self.mask_size,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bitmask applied to the logits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, applied here:

def apply_grammar_bitmask(
self,
scheduler_output: "SchedulerOutput",
logits: torch.Tensor,
):
# Serialization of np.ndarray is much more efficient than a tensor,
# so we receive it in that format.
grammar_bitmask = scheduler_output.grammar_bitmask
if grammar_bitmask is None:
return
# We receive the structured output bitmask from the scheduler, but the
# indices of the requests in the batch may not match the indices of
# the bitmask since the scheduler doesn't know how the gpu runner is
# ordering the requests in the batch. We need to sort the bitmask to
# match the order of the requests used here.
struct_out_req_batch_indices: dict[str, int] = {}
indices_match = True
for req_id in self.input_batch.req_ids:
mask_index = scheduler_output.structured_output_request_ids.get(
req_id)
if mask_index is None:
# not a structured output request
continue
batch_index = self.input_batch.req_id_to_index[req_id]
if batch_index != mask_index:
indices_match = False
struct_out_req_batch_indices[req_id] = batch_index
if not indices_match:
# Sort the bitmask to match the order of the requests
sorted_bitmask = np.zeros_like(grammar_bitmask)
for req_id, batch_index in struct_out_req_batch_indices.items():
orig_index = scheduler_output.structured_output_request_ids[
req_id]
sorted_bitmask[batch_index] = grammar_bitmask[orig_index]
grammar_bitmask = sorted_bitmask
grammar_bitmask = torch.from_numpy(grammar_bitmask)
# TODO: compatibility with spec decode
xgr.apply_token_bitmask_inplace(
logits,
grammar_bitmask.to(self.device, non_blocking=True),
indices=list(struct_out_req_batch_indices.values()),
)

Copy link
Collaborator Author

@robertgshaw2-redhat robertgshaw2-redhat Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to me that this is the model vocab_size then, since then it matches the shape of the logits

@mergify
Copy link

mergify bot commented Mar 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 14, 2025
@russellb
Copy link
Member

The tests passed in CI, though the change is very suspicious. It could use more analysis and a proper explanation of the right thing, but it's definitely better than it was ...

I can try to dig into this for a proper explanation (and likely another change) to put this to bed for good next week

@russellb russellb force-pushed the v1-entrypoints-test branch from 022159e to ce6c82a Compare March 15, 2025 14:33
@russellb
Copy link
Member

I rebased this on main. I THINK that all fixes necessary are in main and we're down to the one line that turns on the tests ... let's see what CI says

@mergify mergify bot removed the needs-rebase label Mar 15, 2025
@russellb
Copy link
Member

Looks like @aarnphm ’s last fix for the xgrammar bitmask didn’t do the trick.

@russellb
Copy link
Member

replaced by #14903

@russellb russellb closed this Mar 16, 2025
@robertgshaw2-redhat robertgshaw2-redhat deleted the v1-entrypoints-test branch March 24, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants