[V1] Entrypoints Test - Enable #14832

robertgshaw2-redhat · 2025-03-14T17:32:30Z

SUMMARY:

enable tests

github-actions · 2025-03-14T17:32:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

russellb · 2025-03-14T18:52:32Z

vllm/v1/structured_output/__init__.py

        tokenizer = tokenizer_group.get_lora_tokenizer(None)
-        self.vocab_size = tokenizer.max_token_id + 1
+        self.mask_size = max(tokenizer.max_token_id,
+                             self.vllm_config.model_config.get_vocab_size())


@aarnphm @njhill FYI -- This is where I finally landed that makes our tests pass for both Qwen and Mistral

but I'm definitely not confident in this ... I don't have a complete enough understanding of the factors involved here to know we're always getting the right value ...

Do we need the +1?

in the case that broke for me, get_vocab_size() returned the same as "max_token_id + 1".

There is another case, tested in CI, that needed the value "max_token_id" to work.

I'll probably run out of time today before I can get to the bottom of this properly...

hmm, i think we should use len(tokenizer.get_vocab()) to calculate it eagerly

#14851

I rebased on that and it’s still failing

robertgshaw2-redhat · 2025-03-14T20:25:02Z

vllm/v1/structured_output/__init__.py

        self._grammar_bitmask = xgr.allocate_token_bitmask(
            self.vllm_config.scheduler_config.max_num_seqs,
-            self.vocab_size,
+            self.mask_size,


Is the bitmask applied to the logits?

correct, applied here:

vllm/vllm/v1/worker/gpu_model_runner.py

Lines 869 to 914 in 270a5da

def apply_grammar_bitmask(

self,

scheduler_output: "SchedulerOutput",

logits: torch.Tensor,

):

# Serialization of np.ndarray is much more efficient than a tensor,

# so we receive it in that format.

grammar_bitmask = scheduler_output.grammar_bitmask

if grammar_bitmask is None:

return

# We receive the structured output bitmask from the scheduler, but the

# indices of the requests in the batch may not match the indices of

# the bitmask since the scheduler doesn't know how the gpu runner is

# ordering the requests in the batch. We need to sort the bitmask to

# match the order of the requests used here.

struct_out_req_batch_indices: dict[str, int] = {}

indices_match = True

for req_id in self.input_batch.req_ids:

mask_index = scheduler_output.structured_output_request_ids.get(

req_id)

if mask_index is None:

# not a structured output request

continue

batch_index = self.input_batch.req_id_to_index[req_id]

if batch_index != mask_index:

indices_match = False

struct_out_req_batch_indices[req_id] = batch_index

if not indices_match:

# Sort the bitmask to match the order of the requests

sorted_bitmask = np.zeros_like(grammar_bitmask)

for req_id, batch_index in struct_out_req_batch_indices.items():

orig_index = scheduler_output.structured_output_request_ids[

req_id]

sorted_bitmask[batch_index] = grammar_bitmask[orig_index]

grammar_bitmask = sorted_bitmask

grammar_bitmask = torch.from_numpy(grammar_bitmask)

# TODO: compatibility with spec decode

xgr.apply_token_bitmask_inplace(

logits,

grammar_bitmask.to(self.device, non_blocking=True),

indices=list(struct_out_req_batch_indices.values()),

)

It would make sense to me that this is the model vocab_size then, since then it matches the shape of the logits

mergify · 2025-03-14T21:08:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

russellb · 2025-03-14T22:21:03Z

The tests passed in CI, though the change is very suspicious. It could use more analysis and a proper explanation of the right thing, but it's definitely better than it was ...

I can try to dig into this for a proper explanation (and likely another change) to put this to bed for good next week

Signed-off-by: [email protected] <[email protected]>

russellb · 2025-03-15T14:34:11Z

I rebased this on main. I THINK that all fixes necessary are in main and we're down to the one line that turns on the tests ... let's see what CI says

russellb · 2025-03-15T16:04:28Z

Looks like @aarnphm ’s last fix for the xgrammar bitmask didn’t do the trick.

russellb · 2025-03-16T20:39:30Z

replaced by #14903

robertgshaw2-redhat requested review from mgoin and russellb as code owners March 14, 2025 17:32

mergify bot added ci/build v1 labels Mar 14, 2025

russellb reviewed Mar 14, 2025

View reviewed changes

This was referenced Mar 14, 2025

[V1] [CI] Enable v1/entrypoints #14619

Closed

[V1] Fix vocab size calculation for structured output #14826

Merged

russellb added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2025

robertgshaw2-redhat commented Mar 14, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 14, 2025

updated

ce6c82a

Signed-off-by: [email protected] <[email protected]>

russellb force-pushed the v1-entrypoints-test branch from 022159e to ce6c82a Compare March 15, 2025 14:33

mergify bot removed the needs-rebase label Mar 15, 2025

simon-mo added this to the v0.8.0 milestone Mar 15, 2025

aarnphm mentioned this pull request Mar 16, 2025

[Fix][Structured Output] using vocab_size to construct matcher #14868

Merged

russellb closed this Mar 16, 2025

robertgshaw2-redhat deleted the v1-entrypoints-test branch March 24, 2025 18:04

	def apply_grammar_bitmask(
	self,
	scheduler_output: "SchedulerOutput",
	logits: torch.Tensor,
	):
	# Serialization of np.ndarray is much more efficient than a tensor,
	# so we receive it in that format.
	grammar_bitmask = scheduler_output.grammar_bitmask
	if grammar_bitmask is None:
	return

	# We receive the structured output bitmask from the scheduler, but the
	# indices of the requests in the batch may not match the indices of
	# the bitmask since the scheduler doesn't know how the gpu runner is
	# ordering the requests in the batch. We need to sort the bitmask to
	# match the order of the requests used here.
	struct_out_req_batch_indices: dict[str, int] = {}
	indices_match = True
	for req_id in self.input_batch.req_ids:
	mask_index = scheduler_output.structured_output_request_ids.get(
	req_id)
	if mask_index is None:
	# not a structured output request
	continue
	batch_index = self.input_batch.req_id_to_index[req_id]
	if batch_index != mask_index:
	indices_match = False
	struct_out_req_batch_indices[req_id] = batch_index

	if not indices_match:
	# Sort the bitmask to match the order of the requests
	sorted_bitmask = np.zeros_like(grammar_bitmask)
	for req_id, batch_index in struct_out_req_batch_indices.items():
	orig_index = scheduler_output.structured_output_request_ids[
	req_id]
	sorted_bitmask[batch_index] = grammar_bitmask[orig_index]
	grammar_bitmask = sorted_bitmask

	grammar_bitmask = torch.from_numpy(grammar_bitmask)

	# TODO: compatibility with spec decode
	xgr.apply_token_bitmask_inplace(
	logits,
	grammar_bitmask.to(self.device, non_blocking=True),
	indices=list(struct_out_req_batch_indices.values()),
	)

Uh oh!

[V1] Entrypoints Test - Enable #14832

[V1] Entrypoints Test - Enable #14832

Uh oh!

Conversation

robertgshaw2-redhat commented Mar 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 14, 2025

Uh oh!

russellb commented Mar 14, 2025

Uh oh!

russellb commented Mar 15, 2025

Uh oh!

russellb commented Mar 15, 2025

Uh oh!

russellb commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

robertgshaw2-redhat commented Mar 14, 2025 •

edited by github-actions bot

Loading

robertgshaw2-redhat Mar 15, 2025 •

edited

Loading