[P/D Disagg] Direct NIXL Connector #60

tlrmchlsmth · 2025-05-01T22:05:11Z

No description provided.

Signed-off-by: ApostaC <[email protected]>

Signed-off-by: Tyler Michael Smith <[email protected]>

…mcache

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated Signed-off-by: [email protected] <[email protected]>

Signed-off-by: [email protected] <[email protected]>

* updated Signed-off-by: [email protected] <[email protected]> * stash Signed-off-by: [email protected] <[email protected]> * cleanup Signed-off-by: [email protected] <[email protected]> --------- Signed-off-by: [email protected] <[email protected]>

- fix spelling - CUDA_VISIBLE_DEVICES should be set externally Signed-off-by: Tyler Michael Smith <[email protected]>

Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-05-01T22:05:23Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

* updated Signed-off-by: [email protected] <[email protected]> * fix merge Signed-off-by: [email protected] <[email protected]> * updated Signed-off-by: [email protected] <[email protected]> * updated Signed-off-by: [email protected] <[email protected]> * updated Signed-off-by: [email protected] <[email protected]> * updated Signed-off-by: [email protected] <[email protected]> --------- Signed-off-by: [email protected] <[email protected]>

Signed-off-by: Tyler Michael Smith <[email protected]>

…into nixl_integration

Signed-off-by: Tyler Michael Smith <[email protected]>

Signed-off-by: mgoin <[email protected]>

Signed-off-by: Tyler Michael Smith <[email protected]>

tests/v1/kv_connector/test_remote_prefill_lifecycle.py

tests/v1/kv_connector/utils.py

njhill

Awesome work @tlrmchlsmth (and @robertgshaw2-redhat)! Looks really good. I've reviewed everything but nixl_connector.py, will look at that next.

All my comments so far are pretty minor things / shouldn't be blockers.

vllm/v1/core/sched/scheduler.py

vllm/v1/worker/gpu_model_runner.py

njhill · 2025-05-02T23:29:17Z

vllm/v1/worker/gpu_model_runner.py

+        def maybe_wait_for_save():
+            if has_kv_transfer_group():
+                kv_connector = get_kv_transfer_group()
+                kv_connector.wait_for_save()
+
+        def maybe_get_finished() -> tuple[set[str], set[str]]:
+            if has_kv_transfer_group():
+                kv_connector = get_kv_transfer_group()
+                return kv_connector.get_finished()
+            else:
+                return set(), set()


I think these could all be global functions?

Not this PR but I also don't think the get_kv_transfer_group / has_kv_transfer_group is needed, may as well just have the latter and return None if no connector is configured.

njhill · 2025-05-02T23:37:27Z

vllm/v1/worker/gpu_model_runner.py

        self._update_states(scheduler_output)
        if not scheduler_output.total_num_scheduled_tokens:
+            # KV send/recv even if no work to do.
+            with set_forward_context(None, self.vllm_config):


Why do we need to set the forward context here?

The forward context gets passed into the KV connector.

We need to change this as it's not a stable API. And we should do this very soon before there's more KVConnector adoption

njhill · 2025-05-02T23:37:51Z

vllm/v1/worker/gpu_model_runner.py

+                # These transfers are designed to be async and the requests
+                # involved may be disjoint from the running requests.
+                # Do this here to save a collective_rpc.
+                kv_connector.start_load_kv(get_forward_context())


Maybe better to just pass None? (make forward context optional?)

vllm/entrypoints/openai/serving_completion.py

njhill

A few more minor things