Skip to content

Conversation

@markmc
Copy link
Member

@markmc markmc commented Nov 28, 2025

Include the internal request ID that the prefill instance is expecting the decode instance to send it in the NIXL notification.

Right now, we rely on the proxy supplying the ID via X-Request-ID and that prefill and decode will mangle this ID in identical ways. This is obviously quite brittle, and P should be explicit about what ID it expects from D.

Relates to #27987 - adding a random prefix to client-provided request IDs.

@markmc
Copy link
Member Author

markmc commented Nov 28, 2025

Example of it working on top of #27987

Prefill side:

$ grep cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0 nixl-prefill-1764323*
nixl-prefill-1764323995.log:(APIServer pid=3451400) DEBUG 11-28 05:02:04 [entrypoints/logger.py:37] Request cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0 details: prompt: 'Do you know the book Traction by Gino Wickman', prompt_token_ids: [128000, 5519, 499, 1440, 279, 2363, 350, 16597, 555, 480, 3394, 75206, 1543], prompt_embeds shape: None.
nixl-prefill-1764323995.log:(APIServer pid=3451400) INFO 11-28 05:02:04 [entrypoints/logger.py:47] Received request cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0: params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args={'kv_transfer_params': {'do_remote_decode': True, 'do_remote_prefill': False, 'remote_engine_id': None, 'remote_block_ids': None, 'remote_host': None, 'remote_port': None}}), lora_request: None.
nixl-prefill-1764323995.log:(APIServer pid=3451400) INFO 11-28 05:02:04 [v1/engine/async_llm.py:360] Added request cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0.
nixl-prefill-1764323995.log:(EngineCore_DP0 pid=3452005) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:620] NIXLConnector request_finished(cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0), request_status=FINISHED_LENGTH_CAPPED, kv_transfer_params={'do_remote_decode': True, 'do_remote_prefill': False, 'remote_engine_id': None, 'remote_block_ids': None, 'remote_host': None, 'remote_port': None}
nixl-prefill-1764323995.log:(EngineCore_DP0 pid=3452005) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:655] NIXLConnector request_finished(cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0) waiting for 480 seconds for remote decode to fetch blocks
nixl-prefill-1764323995.log:(EngineCore_DP0 pid=3452005) DEBUG 11-28 05:02:04 [v1/core/sched/scheduler.py:1492] Finished sending KV transfer for request cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0

Decode side, notice the decode request id is cmpl-b82f4d1c-482061a8-551f-4964-925d-fc0925e9261b-0 and the prefill request id is cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0

grep cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0 nixl-decode-1764323*
nixl-decode-1764323996.log:(APIServer pid=3451466) INFO 11-28 05:02:04 [entrypoints/logger.py:47] Received request cmpl-b82f4d1c-482061a8-551f-4964-925d-fc0925e9261b-0: params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=120, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args={'kv_transfer_params': {'do_remote_prefill': True, 'do_remote_decode': False, 'remote_block_ids': [91], 'remote_engine_id': '75192ab7-d1c6-48cf-bd82-ccef4ba0e1d6', 'remote_request_id': 'cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0', 'remote_host': 'localhost', 'remote_port': 5559, 'tp_size': 1}}), lora_request: None.
nixl-decode-1764323996.log:(EngineCore_DP0 pid=3452080) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:486] NIXLConnector get_num_new_matched_tokens: num_computed_tokens=0, kv_transfer_params={'do_remote_prefill': True, 'do_remote_decode': False, 'remote_block_ids': [91], 'remote_engine_id': '75192ab7-d1c6-48cf-bd82-ccef4ba0e1d6', 'remote_request_id': 'cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0', 'remote_host': 'localhost', 'remote_port': 5559, 'tp_size': 1}
nixl-decode-1764323996.log:(EngineCore_DP0 pid=3452080) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:507] NIXLConnector update_state_after_alloc: num_external_tokens=13, kv_transfer_params={'do_remote_prefill': True, 'do_remote_decode': False, 'remote_block_ids': [91], 'remote_engine_id': '75192ab7-d1c6-48cf-bd82-ccef4ba0e1d6', 'remote_request_id': 'cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0', 'remote_host': 'localhost', 'remote_port': 5559, 'tp_size': 1}
nixl-decode-1764323996.log:(EngineCore_DP0 pid=3452080) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:507] NIXLConnector update_state_after_alloc: num_external_tokens=0, kv_transfer_params={'do_remote_prefill': False, 'do_remote_decode': False, 'remote_block_ids': [91], 'remote_engine_id': '75192ab7-d1c6-48cf-bd82-ccef4ba0e1d6', 'remote_request_id': 'cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0', 'remote_host': 'localhost', 'remote_port': 5559, 'tp_size': 1}
nixl-decode-1764323996.log:(EngineCore_DP0 pid=3452080) DEBUG 11-28 05:02:04 [distributed/.../v1/nixl_connector.py:620] NIXLConnector request_finished(cmpl-b82f4d1c-482061a8-551f-4964-925d-fc0925e9261b-0), request_status=FINISHED_LENGTH_CAPPED, kv_transfer_params={'do_remote_prefill': False, 'do_remote_decode': False, 'remote_block_ids': [91], 'remote_engine_id': '75192ab7-d1c6-48cf-bd82-ccef4ba0e1d6', 'remote_request_id': 'cmpl-b9d6c33a-482061a8-551f-4964-925d-fc0925e9261b-0', 'remote_host': 'localhost', 'remote_port': 5559, 'tp_size': 1}

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces remote_request_id to make the NIXL notification more robust, which is a good improvement. The changes correctly plumb this new ID to where it's needed for constructing the notification ID. However, I've found a critical issue where direct dictionary access on kv_transfer_params could lead to a KeyError in certain paths. I've provided a suggestion to make this access safer.

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2025
Include the internal request ID that the prefill instance is
expecting the decode instance to send it in the NIXL notification.

Right now, we rely on the proxy supplying the ID via X-Request-ID
and that prefill and decode will mangle this ID in identical ways.
This is obviously quite brittle, and P should be explicit about what
ID it expects from D.

Relates to vllm-project#27987 - adding a random prefix to client-provided
request IDs.

Signed-off-by: Mark McLoughlin <[email protected]>
@markmc markmc force-pushed the nixl-remote-request-id branch from c519e25 to 9d04ff1 Compare November 28, 2025 15:03
@mergify mergify bot added the v1 label Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant