Skip to content

Conversation

@tlrmchlsmth
Copy link
Member

@tlrmchlsmth tlrmchlsmth commented May 1, 2025

No description provided.

ApostaC and others added 30 commits April 17, 2025 13:56
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>
@github-actions
Copy link

github-actions bot commented May 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

robertgshaw2-redhat and others added 9 commits May 1, 2025 20:27
* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
@njhill njhill force-pushed the nixl_integration branch from db5046e to aec447c Compare May 2, 2025 18:40
tlrmchlsmth and others added 10 commits May 2, 2025 19:33
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @tlrmchlsmth (and @robertgshaw2-redhat)! Looks really good. I've reviewed everything but nixl_connector.py, will look at that next.

All my comments so far are pretty minor things / shouldn't be blockers.

Comment on lines 1038 to 1048
def maybe_wait_for_save():
if has_kv_transfer_group():
kv_connector = get_kv_transfer_group()
kv_connector.wait_for_save()

def maybe_get_finished() -> tuple[set[str], set[str]]:
if has_kv_transfer_group():
kv_connector = get_kv_transfer_group()
return kv_connector.get_finished()
else:
return set(), set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could all be global functions?

Not this PR but I also don't think the get_kv_transfer_group / has_kv_transfer_group is needed, may as well just have the latter and return None if no connector is configured.

self._update_states(scheduler_output)
if not scheduler_output.total_num_scheduled_tokens:
# KV send/recv even if no work to do.
with set_forward_context(None, self.vllm_config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set the forward context here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward context gets passed into the KV connector.

We need to change this as it's not a stable API. And we should do this very soon before there's more KVConnector adoption

# These transfers are designed to be async and the requests
# involved may be disjoint from the running requests.
# Do this here to save a collective_rpc.
kv_connector.start_load_kv(get_forward_context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to just pass None? (make forward context optional?)

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor things

Comment on lines +261 to +262
first_layer_name = next(iter(kv_caches))
first_kv_cache = kv_caches[first_layer_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
first_layer_name = next(iter(kv_caches))
first_kv_cache = kv_caches[first_layer_name]
first_layer_name, first_kv_cache = next(iter(kv_caches.items()))

# [2 (k and v), num_blocks, ...]
# TODO(tms): num_blocks will be in a different spot for MLA.
num_blocks = first_kv_cache.shape[1]
kv_elem_size = first_kv_cache[0].element_size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kv_elem_size = first_kv_cache[0].element_size()
kv_elem_size = first_kv_cache.element_size()

# hybrid attn, etc
self.block_len = kv_elem_size * math.prod(first_kv_cache.shape[-3:])

logger.debug("Per layer kv cache size: %s", first_kv_cache[0].shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think?

Suggested change
logger.debug("Per layer kv cache size: %s", first_kv_cache[0].shape)
logger.debug("Per layer kv cache size: %s", first_kv_cache.shape)

Comment on lines +285 to +286
for layer_name in kv_caches:
for cache in kv_caches[layer_name]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for layer_name in kv_caches:
for cache in kv_caches[layer_name]:
for caches in kv_caches.values():
for cache in caches:

And we are iterating over slices of the first dimension of the tensor here, is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to leave this as-is (along with some of the other comments around here) since I'm not sure it's right. Here, order is important and I'm not sure the regions will end up being ordered by layer if we make this change

Copy link
Member

@njhill njhill May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Though this particular change at least is equivalent. When you iterate over a dict it's always in the same order whether you're iterating over the keys, values or items. I guess I should have suggested layer rather than caches as the variable name

kv_caches[first_layer_name][1, b, 0, 0, 0])
remote_engine_id = None # HACK for debug send

if NIXL_ROLE == "SENDER":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be also be nested under the dev if above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is still needed for handshaking

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock if needed, I don't think any of my comments are critical

Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
@tlrmchlsmth tlrmchlsmth merged commit f1575de into neuralmagic:disagg_pd_dev May 3, 2025
2 checks passed
# Note(tms): The remote_block_ids only contain full computed blocks,
# while the local_block_ids are all blocks allocated for this request,
# so truncate the local_block_ids to account for this.
del local_block_ids[len(remote_block_ids):]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: im not sure this is sound. We need to double check this.

Also now that num_preallocate_tokens=0, we should not have this case anymore

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we should remove this. Just making a note here for posterity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants