Skip to content

Conversation

@R3hankhan123
Copy link
Contributor

@R3hankhan123 R3hankhan123 commented Nov 4, 2025

Purpose

This PR optimizes the s390x Dockerfile to improve build efficiency and maintainability:

  1. Simplified compiler toolchain: Replaced individual GCC packages with gcc-toolset-14 for a more streamlined installation
  2. Updated PyTorch: Removed custom PyTorch 2.7.0 build stage and switched to official PyTorch 2.8.0 CPU wheels
  3. Updated torchvision: Upgraded from v0.20.1to v0.23.0 to match PyTorch 2.8.0
  4. Removed AWS-LC-sys patches: Eliminated patch for aws-lc as fix has gone into the repo
  5. Dynamic version detection: Replaced hardcoded version strings with automatic extraction from requirements files (e.g., outlines_core version now reads from requirements/common.txt)
  6. Extended platform support: Added s390x to the platform conditions for llguidance and xgrammar in requirements/common.txt, enabling these structured output dependencies on s390x architecture

Test Plan

  1. Build the s390x Docker image: docker build -f docker/Dockerfile.s390x -t vllm-cpu-s390x .
  2. Verify all build stages complete successfully
  3. Run the container and test basic vLLM functionality

Test Result

[root@b314lp50 ~]# podman run -it --rm   -p 8000:8000   --name vllm-gpt2   --entrypoint vllm   quay.io/r3hankhan/vllm:torch-2.8.0   serve gpt2  --host 0.0.0.0 --port 8000
INFO 11-04 05:42:44 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=1) INFO 11-04 05:42:50 [api_server.py:1952] vLLM API server version 0.11.1rc6.dev58+g7f4bdadb9.d20251103
(APIServer pid=1) INFO 11-04 05:42:50 [utils.py:253] non-default args: {'model_tag': 'gpt2', 'host': '0.0.0.0', 'model': 'gpt2', 'dtype': 'bfloat16'}
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 6.30MB/s]
(APIServer pid=1) INFO 11-04 05:43:04 [model.py:657] Resolved architecture: GPT2LMHeadModel
(APIServer pid=1) INFO 11-04 05:43:05 [model.py:1975] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=1) INFO 11-04 05:43:05 [model.py:1752] Using max model len 1024
(APIServer pid=1) WARNING 11-04 05:43:05 [cpu.py:160] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
(APIServer pid=1) INFO 11-04 05:43:05 [arg_utils.py:1358] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
(APIServer pid=1) INFO 11-04 05:43:05 [arg_utils.py:1364] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 266kB/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 3.69MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 2.53MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 4.85MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 1.32MB/s]
INFO 11-04 05:43:15 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_DP0 pid=21) INFO 11-04 05:43:21 [core.py:93] Initializing a V1 LLM engine (v0.11.1rc6.dev58+g7f4bdadb9.d20251103) with config: model='gpt2', speculative_config=None, tokenizer='gpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cpu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=gpt2, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={'level': None, 'mode': 2, 'debug_dump_path': None, 'cache_dir': '', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': None, 'use_inductor': None, 'compile_sizes': None, 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'dce': True, 'size_asserts': False, 'nan_asserts': False, 'epilogue_fusion': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'use_cudagraph': True, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'full_cuda_graph': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'local_cache_dir': None}
(EngineCore_DP0 pid=21) WARNING 11-04 05:43:22 [cpu.py:404] Pin memory is not supported on CPU.
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:164] auto thread-binding list (id, physical core): [(0, 0), (1, 0), (2, 1), (3, 1), (8, 4), (9, 4), (10, 5), (11, 5), (16, 8), (17, 8), (18, 9), (19, 9)]
[W1104 05:43:23.044149155 utils.cpp:57] Warning: numa_migrate_pages failed. errno: 1 (function init_cpu_threads_env)
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] OMP threads binding of Process 21:
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 21, core 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 33, core 1
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 34, core 2
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 35, core 3
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 36, core 8
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 37, core 9
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 38, core 10
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 39, core 11
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 40, core 16
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 41, core 17
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 42, core 18
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 43, core 19
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_model_runner.py:67] Starting to load model gpt2...
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu.py:147] Using Torch SDPA backend.
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:13<00:00, 41.7MB/s]
(EngineCore_DP0 pid=21) INFO 11-04 05:43:37 [weight_utils.py:440] Time spent downloading weights for gpt2: 13.691614 seconds
(EngineCore_DP0 pid=21) INFO 11-04 05:43:37 [weight_utils.py:480] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.08s/it]
(EngineCore_DP0 pid=21) 
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [default_loader.py:314] Loading weights took 1.09 seconds
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [kv_cache_utils.py:1229] GPU KV cache size: 116,496 tokens
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [kv_cache_utils.py:1234] Maximum concurrency for 1,024 tokens per request: 113.77x
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [cpu_model_runner.py:77] Warming up model for the compilation...
(EngineCore_DP0 pid=21) INFO 11-04 05:44:01 [cpu_model_runner.py:87] Warming up done.
(EngineCore_DP0 pid=21) INFO 11-04 05:44:01 [core.py:258] init engine (profile, create kv cache, warmup model) took 22.86 seconds
(EngineCore_DP0 pid=21) WARNING 11-04 05:44:05 [cpu.py:160] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
(APIServer pid=1) INFO 11-04 05:44:05 [api_server.py:1717] Supported tasks: ['generate']
(APIServer pid=1) INFO 11-04 05:44:06 [api_server.py:2021] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:38] Available routes are:
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) INFO 11-04 05:44:16 [loggers.py:215] Engine 000: Avg prompt throughput: 0.3 tokens/s, Avg generation throughput: 2.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     10.88.0.1:51662 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 11-04 05:44:26 [loggers.py:215] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 11-04 05:44:36 [loggers.py:215] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

inference result

[root@b314lp50 ~]# curl http://localhost:8000/v1/completions   -H "Content-Type: application/json"   -d '{
    "model": "gpt2",
    "prompt": "Once upon a time",
    "max_tokens": 50,
    "temperature": 0.7
  }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   742  100   637  100   105     59      9  0:00:11  0:00:10  0:00:01   138
{
  "id": "cmpl-46bf74eae1734c67af590d166fe9abab",
  "object": "text_completion",
  "created": 1762235050,
  "model": "gpt2",
  "choices": [
    {
      "index": 0,
      "text": ", the blue paint with the lights a set the head, the state the government agencies, the pastor turned out of black people have been with cars to protect your Web.\n\"\n11: 1.5)\n\" the Eagles fans, to",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "token_ids": null,
      "prompt_logprobs": null,
      "prompt_token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 54,
    "completion_tokens": 50,
    "prompt_tokens_details": null
  },
  "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@chatgpt-codex-connector
Copy link

💡 Codex Review

--extra-index-url https://download.pytorch.org/whl/cpu
torch==2.8.0+cpu; platform_machine == "x86_64" or platform_machine == "s390x"
torch==2.8.0; platform_system == "Darwin"
torch==2.8.0; platform_machine == "ppc64le" or platform_machine == "aarch64"

P1 Badge Torch wheels unavailable for s390x

The Dockerfile now relies on pip install torch==2.8.0+cpu for the s390x build (the torch source build stage was removed and the requirement here now includes platform_machine == "s390x"). PyTorch does not publish prebuilt wheels for the s390x architecture and also does not ship an sdist, so pip will fail with “No matching distribution found for torch==2.8.0+cpu” when this image is built on s390x. As a result the CPU Docker image can no longer be built. The custom build step needs to be retained or a wheel must be provided for s390x.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a great optimization for the s390x Dockerfile. The changes, such as switching to a pre-built PyTorch wheel and simplifying the toolchain, significantly improve build efficiency and maintainability.

I have one major observation that could lead to further significant optimization. The Dockerfile includes a numba-builder stage which builds llvm, llvmlite, and numba from source. This is a very time-consuming process. However, requirements/cpu.txt explicitly excludes numba for the s390x architecture (numba == 0.61.2; platform_machine != "s390x"). This suggests that numba is not a required dependency for s390x.

If numba is indeed not needed, removing the numba-builder stage and the corresponding installation of llvmlite and numba wheels in the final vllm-cpu stage would drastically reduce build time and image size. While I cannot comment directly on the relevant lines due to review constraints, I strongly recommend investigating this discrepancy.

Apart from this point, the rest of the changes are excellent and well-implemented.

@heheda12345 heheda12345 enabled auto-merge (squash) November 5, 2025 07:45
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025
@simon-mo simon-mo disabled auto-merge November 5, 2025 19:25
@simon-mo simon-mo merged commit e044924 into vllm-project:main Nov 5, 2025
90 of 92 checks passed
andylolu2 pushed a commit to andylolu2/vllm that referenced this pull request Nov 5, 2025
@R3hankhan123 R3hankhan123 deleted the s390x-optimzations branch November 6, 2025 02:26
zWaNg3 added a commit to fangyuchu/vllm that referenced this pull request Nov 7, 2025
* add fault_report_addr in FaultToleranceConfig

* add handle fault&get_fault_info api

Signed-off-by: w00689259 <[email protected]>

* remove fault_report_address in CoreEngineActorManager __init__

Signed-off-by: a798347923 <[email protected]>

* ruff format

Signed-off-by: a798347923 <[email protected]>

* add handle fault&get_fault_info api

Signed-off-by: w00689259 <[email protected]>

* fix one bug.

Signed-off-by: fangyuchu <[email protected]>

* add fault_report_port in FaultToleranceConfig

Signed-off-by: a798347923 <[email protected]>

* add zmq_addr concatenate with fault_report_addr and fault_report_port

Signed-off-by: a798347923 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fix some bug

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* fault reporter bug fix

Signed-off-by: w00689259 <[email protected]>

* remove fault_report_addr in FaultToleranceConfig

Signed-off-by: a798347923 <[email protected]>

* refactor: relocate method serialization functions to serial_util.py

Signed-off-by: fangyuchu <[email protected]>

* fix actor bug

* fix actor bug

* add engine_core_cmd_addr in FaultToleranceConfig

Signed-off-by: a798347923 <[email protected]>

* add and use _stop_worker_execution in EngineCoreGuard

Signed-off-by: a798347923 <[email protected]>

* add and use run in WorkerGuard

Signed-off-by: a798347923 <[email protected]>

* fix actor bug

* fix bug

* fix sentinel

* fix bug vllm/v1/engine/core.py:847: error: Missing positional argument "tp_size" in call to "EngineCoreGuard"

Signed-off-by: a798347923 <[email protected]>

* fix bug error: Missing positional arguments "length", "byteorder" in call to "to_bytes" of "int"

Signed-off-by: a798347923 <[email protected]>

* fix bug in fault tolerance mode

Signed-off-by: w00689259 <[email protected]>

* fix bug in fault tolerance mode

Signed-off-by: w00689259 <[email protected]>

* change fault_report_port to internal_fault_report_port
add external_fault_notify_port

Signed-off-by: a798347923 <[email protected]>

* change fault_report_port to internal_fault_report_port
add external_fault_notify_port

Signed-off-by: a798347923 <[email protected]>

* add _recv_cmd func
use deserialize_method_call and run_method in run func

Signed-off-by: a798347923 <[email protected]>

* Update core.py

fix bug error: Need type annotation for "kwargs" (hint: "kwargs: dict[<type>, <type>] = ...")

Signed-off-by: a798347923 <[email protected]>

* add self.ctx.term() in shutdown()

Signed-off-by: a798347923 <[email protected]>

* changed import deserialize_method_call,serialize_method_call

Signed-off-by: a798347923 <[email protected]>

* changed init worker_guard in init_device

Signed-off-by: a798347923 <[email protected]>

* Update core.py

add import serialize_method_call

Signed-off-by: a798347923 <[email protected]>

* Update gpu_worker.py

changed init WorkerGuard in init_device

Signed-off-by: a798347923 <[email protected]>

* Update gpu_worker.py

FIX BUG self.worker_guard: WorkerGuard|None = None

Signed-off-by: a798347923 <[email protected]>

* Update gpu_worker.py

fix bug error: Argument 1 to "deserialize_method_call" has incompatible type "str | None"; expected "str"  [arg-type]

Signed-off-by: a798347923 <[email protected]>

* Update gpu_worker.py

ruff format

Signed-off-by: a798347923 <[email protected]>

* Update core.py

ruff-format

Signed-off-by: a798347923 <[email protected]>

* actively send exception information

Signed-off-by: w00689259 <[email protected]>

* actively send exception information

Signed-off-by: w00689259 <[email protected]>

* actively send exception information

Signed-off-by: w00689259 <[email protected]>

* change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses

Signed-off-by: a798347923 <[email protected]>

* change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses

Signed-off-by: a798347923 <[email protected]>

* Update utils.py

delete engine_core_cmd_addr in EngineZmqAddresses

Signed-off-by: a798347923 <[email protected]>

* Remove redundant configuration: fault-pub-port

Signed-off-by: fangyuchu <[email protected]>

* Send pause instructions after receiving fault info in ClientGuard

Signed-off-by: fangyuchu <[email protected]>

* change engine_core_guard_identities from dict[int, bytes] to list[bytes]

Signed-off-by: a798347923 <[email protected]>

* fix bug "only the worker guard of engine core 0 can receive messages sent from engine core guard

Signed-off-by: a798347923 <[email protected]>

* change local_rank to rank_in_group in WorkerGuard

Signed-off-by: a798347923 <[email protected]>

* changed del self.client_cmd_registry[int(unhealthy_engine.engine_id)]

Signed-off-by: a798347923 <[email protected]>

* add gloo communication timeout

* fix some bug

* add  stateless_process_group gloo_comm_timeout

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <[email protected]>

* fix some bug

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <[email protected]>

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <[email protected]>

* fix return format

Signed-off-by: w00689259 <[email protected]>

* fix return format

Signed-off-by: w00689259 <[email protected]>

* fix return format

Signed-off-by: w00689259 <[email protected]>

* add abort request

* fix some bug

* fix some bug

* fix some bug

* add dt for client guard

Signed-off-by: w00689259 <[email protected]>

* add dt for client guard

Signed-off-by: w00689259 <[email protected]>

* add dt for client guard

Signed-off-by: w00689259 <[email protected]>

* Implementation of two types of pause: a soft one by using flag signals and a hard one by aborting nccl communicators.

Signed-off-by: fangyuchu <[email protected]>

* Refine certain log forms and fix a minor bug in pause function.

Signed-off-by: fangyuchu <[email protected]>

* Refactor and abstract the recv_msg logic in CG,ECG,WG.

Signed-off-by: fangyuchu <[email protected]>

* [Frontend] Align finish_reason when tool is called with OpenAI (vllm-project#25054)

Signed-off-by: Sungyoon Jeong <[email protected]>
Co-authored-by: Chauncey <[email protected]>

* [Hybrid] Pass kernel block size to builders (vllm-project#27753)

Signed-off-by: Thomas Parnell <[email protected]>

* [Bugfix] Padded Eagle Specdec with Chunked Prefill (vllm-project#26263)

Signed-off-by: Rémi Delacourt <[email protected]>
Signed-off-by: Rémi Delacourt <[email protected]>
Signed-off-by: remi <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>

* [XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (vllm-project#27964)

Signed-off-by: Kunshang Ji <[email protected]>

* Add and check method uuid when sending commands and receiving results.

Signed-off-by: fangyuchu <[email protected]>

* Add ORCA endpoint load metrics support (vllm-project#24905)

Signed-off-by: Misha Efimov <[email protected]>

* [CI/Build] Remove the flaky gpt-oss lora test (vllm-project#27966)

Signed-off-by: Jee Jee Li <[email protected]>

* Abstract the logic of sending instructions and waiting responses from FaultHandler

Signed-off-by: fangyuchu <[email protected]>

* [Model] Add PaddleOCR-VL Model Support  (vllm-project#27758)

Signed-off-by: zhangyue <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: zhangyue66 <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Isotr0py <[email protected]>

* Add options in EngineCoreGuard to recv execution results from WorkerGuard

Signed-off-by: fangyuchu <[email protected]>

* Early exit for MoE LoRA kernels (vllm-project#27131)

Signed-off-by: gnovack <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>

* [Bugfix] Skip gs:// model paths for speculator detection (vllm-project#27846)

Signed-off-by: Peter Schuurman <[email protected]>

* [BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (vllm-project#27616)

Signed-off-by: ahao-anyscale <[email protected]>

* [CI/Testing] Add basic single node dual batch overlap test (vllm-project#27235)

Signed-off-by: Lucas Wilkinson <[email protected]>

* [Spec Decode] Integrate Suffix Decoding from Arctic Inference (vllm-project#25784)

Co-authored-by: Aurick Qiao <[email protected]>

* [Feature][Benchmarks] Support `inf` burstiness (vllm-project#26941)

Signed-off-by: Sophie du Couédic <[email protected]>

* [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (vllm-project#27764)

Signed-off-by: Lucas Kabela <[email protected]>

* [Bugfix] change FlashMLA reorder_batch_threshold (vllm-project#27777)

Signed-off-by: Matthew Bonanni <[email protected]>

* [Docs] add runai_streamer_sharded to LoadConfig (vllm-project#27937)

Signed-off-by: Andy Xie <[email protected]>

* Add TP parameter to attention tests (vllm-project#27683)

Signed-off-by: Matthew Bonanni <[email protected]>

* [Bugfix][plugin] fla crash on plugin (vllm-project#27322)

* [Bugfix] Fix MoE Routing Simulation (vllm-project#28002)

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove the tpu docker image nightly build. (vllm-project#27997)

Signed-off-by: Qiliang Cui <[email protected]>

* [Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (vllm-project#27748)

Signed-off-by: vllmellm <[email protected]>

* [LoRA] Lora shrink swizzle (vllm-project#27694)

Signed-off-by: li2haipeng <[email protected]>
Signed-off-by: Haipeng Li <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>

* [Refactor] Lazy import tool_parser (vllm-project#27974)

Signed-off-by: chaunceyjiang <[email protected]>

* [NIXL][XPU] Pin NIXL version to 0.7.0 (vllm-project#27849)

Signed-off-by: zhenwei-intel <[email protected]>

* [Metrics] Enable sleep state metric outside of dev mode (vllm-project#27867)

Signed-off-by: Mark McLoughlin <[email protected]>

* [Bug] Batch invariant: Fix flash attn MLA `RuntimeError: scheduler_metadata must have shape (metadata_size)` (vllm-project#27884)

* [CPU]Improve dynamic 4bit moe performance (vllm-project#27240)

Signed-off-by: Zhang Xiangze <[email protected]>

* [CI/Build] Update LM Eval Version in AMD CI (vllm-project#27944)

Signed-off-by: zhewenli <[email protected]>

* [KV Connector] Make KVCacheConfig an explicit constructor argument (vllm-project#27887)

Signed-off-by: Mark McLoughlin <[email protected]>

* [Model] fix ernie45 reasoning_parser (vllm-project#27973)

Signed-off-by: wangyafeng <[email protected]>

* [CI/Build] Fix OpenAI API correctness on AMD CI (vllm-project#28022)

Signed-off-by: zhewenli <[email protected]>

* [BugFix][Performance] Restore flashinfer autotuning for all scenarios (vllm-project#27904)

* Support worker reinitialization after hard pause; add task queue in FaultHandler to ensure sequential task execution

Signed-off-by: fangyuchu <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* Load tuned fused_moe_lora shrink and expand kernel configs separately (vllm-project#27435)

Signed-off-by: Yu Gong <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* resolve conflicts

Signed-off-by: w00689259 <[email protected]>

* Support using Int4PreshuffledTensor after loading (vllm-project#26066)

Signed-off-by: Jerry Zhang <[email protected]>

* [Core] Enable StatLogger in LLMEngine (vllm-project#28020)

Signed-off-by: Zhuohan Li <[email protected]>

* [Model][Bugfix] fix pipeline parallelism support for NemotronH (vllm-project#27968)

Signed-off-by: Tomer Asida <[email protected]>

* [Model] add optimal triton fused moe configs for NemotronH MoE (vllm-project#27967)

Signed-off-by: Tomer Asida <[email protected]>

* [Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (vllm-project#27123)

* [BugFix] Fix incorrect preallocated sampled_token_ids tensor size (vllm-project#28025)

Signed-off-by: Nick Hill <[email protected]>

* [Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (vllm-project#27284)

Signed-off-by: Faqin Zhong <[email protected]>
Co-authored-by: Faqin Zhong <[email protected]>
Co-authored-by: Michael Goin <[email protected]>

* [PERF] Decouple projections from GDN custom op (vllm-project#27512)

Signed-off-by: Vadim Gimpelson <[email protected]>

* [model] Add support for openPangu_Ultra_MoE (vllm-project#27521)

Signed-off-by: yuantao <[email protected]>
Signed-off-by: yt0428 <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>

* [PerfFix] Avoid separate thread for MP executor shm spin (vllm-project#28012)

Signed-off-by: Nick Hill <[email protected]>

* [AsyncScheduling] Don't schedule past request max_tokens (vllm-project#27922)

Signed-off-by: Nick Hill <[email protected]>

* Remove deprecated `--rope-scaling` and `--rope-theta` (vllm-project#28006)

Signed-off-by: Harry Mellor <[email protected]>

* [ROCm][Perf] New design on ROCm AITER MHA backend Implementation (vllm-project#25763)

Signed-off-by: ganyi <[email protected]>

* Added disable rule to track files under benchmarks/lib (vllm-project#28048)

Signed-off-by: Nadav Kluger <[email protected]>

* [Multimodal] Make MediaConnector extensible. (vllm-project#27759)

Signed-off-by: Chenheli Hua <[email protected]>

* [ROCm] gemm_a16w16 upstreaming (vllm-project#26969)

Signed-off-by: Aleksandr Malyshev <[email protected]>
Co-authored-by: Aleksandr Malyshev <[email protected]>

* Revert "[PERF] Decouple projections from GDN custom op" (vllm-project#28080)

Signed-off-by: Vadim Gimpelson <[email protected]>

* add engine core ut

Signed-off-by: w00689259 <[email protected]>

* add engine core ut

Signed-off-by: w00689259 <[email protected]>

* [Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (vllm-project#27740)

* [XPU] Add gpt-oss model support for Intel GPU (vllm-project#27786)

Signed-off-by: Kunshang Ji <[email protected]>

* [CI/Build] Enable some fixed tests in AMD CI (vllm-project#28078)

Signed-off-by: zhewenli <[email protected]>

* [V0 deprecation] Remove VLLM_USE_V1 usage in most modules (vllm-project#27955)

Signed-off-by: wangxiyuan <[email protected]>

* [Bugfix] Fix encoder-only model support for transformers backend (vllm-project#28021)

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>

* [BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (vllm-project#28100)

Signed-off-by: Lucas Wilkinson <[email protected]>

* [Model, Core] Support Granite Speech & LoRA for STT (vllm-project#24455)

* [Refactor] Lazy-loaded reasoning_parser (vllm-project#28092)

Signed-off-by: chaunceyjiang <[email protected]>

* [Refactor] to simplify and extract the shared logic between chat completion and responses (vllm-project#27961)

Signed-off-by: chaunceyjiang <[email protected]>

* [bugfix] fix wrong `dcp_local_seq_lens` calc (vllm-project#27518)

Signed-off-by: Qiu <[email protected]>

* [Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (vllm-project#28011)

Signed-off-by: KuntaiDu <[email protected]>

* [Misc] fix import error for DeepSeekR1ReasoningParser (vllm-project#28114)

Signed-off-by: chaunceyjiang <[email protected]>

* Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (vllm-project#27635)

Signed-off-by: minatoaquaMK2 <[email protected]>

* Bugfix: Cutlass FP8 FusedMoE bad scaling factors (vllm-project#27255)

Signed-off-by: Amir Klein <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Michael Goin <[email protected]>

* [Graph Partition][Cache] Use inductor partition ops config (vllm-project#27702)

Signed-off-by: Boyuan Feng <[email protected]>

* [XPU] Enable custom routing functions in IPEX for Llama4 (vllm-project#28004)

Signed-off-by: frost-intel <[email protected]>

* add kimi reasoning parser (vllm-project#28128)

Signed-off-by: wangzhengtao <[email protected]>
Co-authored-by: wangzhengtao <[email protected]>

* [DCP] check return_lse for all layers in dcp (vllm-project#27929)

Signed-off-by: Chen Zhang <[email protected]>

* [BugFix] Support EP/DP + EPLB with MTP (vllm-project#25311)

Signed-off-by: ilmarkov <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>

* Enabling cooperative multi-gpu tests on multi-gpu nodes (vllm-project#27986)

Signed-off-by: Alexei V. Ivanov <[email protected]>

* [ROCm][MLA] Support block-size > 1 for AITER MLA backend  (vllm-project#27224)

Signed-off-by: ganyi <[email protected]>
Co-authored-by: wuhuikx <[email protected]>

* [Bugfix] Validate custom logits processor xargs for online serving (vllm-project#27560)

Signed-off-by: Isotr0py <[email protected]>

* [misc] add vLLM Beijing Meetup (vllm-project#28127)

Signed-off-by: Jiaju Zhang <[email protected]>

* [Kernel] Fuse computation of g and beta for Gated Delta Net (vllm-project#28095)

Signed-off-by: zjy0516 <[email protected]>

* [Core] add support for reasoning parser plugins (vllm-project#28075)

Signed-off-by: walter beller-morales <[email protected]>

* [Bugfix] vLLM should check Inductor config for compile cache enablement status (vllm-project#27637)

Signed-off-by: Yanan Cao <[email protected]>

* [FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (vllm-project#27994)

Signed-off-by: Chen Zhang <[email protected]>

* [CI]: Add LMCacheConnector Unit Tests (vllm-project#27852)

Signed-off-by: Samuel Shen <[email protected]>
Co-authored-by: Samuel Shen <[email protected]>
Co-authored-by: Yihua Cheng <[email protected]>

* [Feature] Extend batch invariant torch.compile to B200 (vllm-project#27856)

Signed-off-by: PaulZhang12 <[email protected]>

* [Bugfix] Fix Qwen3-Reranker-8B load (vllm-project#28117)

Signed-off-by: wang.yuqi <[email protected]>

* [Docs] Clean up README_TUNING.md (vllm-project#28088)

Signed-off-by: windsonsea <[email protected]>

* [Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023)

Signed-off-by: Rehan Khan <[email protected]>

* [Chore] Remove Nemotron-Nano-VL config copy (vllm-project#28126)

Signed-off-by: Isotr0py <[email protected]>

* [Docs] Add guide to debugging vLLM-torch.compile integration (vllm-project#28094)

Signed-off-by: Richard Zou <[email protected]>

* [Feature]: Add corrupted request metric to V1 metrics system. (vllm-project#27306)

Signed-off-by: atalhens <[email protected]>

* [CI/Build] Update checking logic in cutlass_group_gemm_supported  (vllm-project#27948)

Signed-off-by: zhewenli <[email protected]>

* [CI/Build] Fix `test_defaults_with_usage_context` in AMD CI (vllm-project#27926)

Signed-off-by: zhewenli <[email protected]>

* [Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (vllm-project#25431)

Signed-off-by: KuntaiDu <[email protected]>

* [Debugging] Add annotation for easier trace analysis (vllm-project#22496)

* [PERF] Decouple projections from GDN custom op. Attempt 2 (vllm-project#28083)

Signed-off-by: Vadim Gimpelson <[email protected]>

* [Bug] Fix cpu disable shared_experts `VLLM_DISABLE_SHARED_EXPERTS_STREAM` (vllm-project#28157)

Signed-off-by: yewentao256 <[email protected]>

* [Bug] Fix env string `"0"` same to `True` (vllm-project#28159)

Signed-off-by: yewentao256 <[email protected]>

* Ensure WorkerGuard command execution returns result; fix missing set_device when TP>1

Signed-off-by: fangyuchu <[email protected]>

* [Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (vllm-project#28164)

Signed-off-by: yewentao256 <[email protected]>

* [CI Failure] `nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV` was removed from HF. Skip it in tests (vllm-project#28170)

Signed-off-by: Vadim Gimpelson <[email protected]>

* [Misc] Remove the duplicate code (vllm-project#28111)

Signed-off-by: chaunceyjiang <[email protected]>

* rename& format logger

Signed-off-by: w00689259 <[email protected]>

* rename& format logger

Signed-off-by: w00689259 <[email protected]>

* feat(nccl): enable non-blocking NCCL communicators to support ncclCommAbort

Signed-off-by: fangyuchu <[email protected]>

---------

Signed-off-by: w00689259 <[email protected]>
Signed-off-by: a798347923 <[email protected]>
Signed-off-by: fangyuchu <[email protected]>
Signed-off-by: zWaNg3 <[email protected]>
Signed-off-by: a798347923 <[email protected]>
Signed-off-by: Sungyoon Jeong <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Rémi Delacourt <[email protected]>
Signed-off-by: Rémi Delacourt <[email protected]>
Signed-off-by: remi <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Misha Efimov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: zhangyue <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: zhangyue66 <[email protected]>
Signed-off-by: gnovack <[email protected]>
Signed-off-by: Peter Schuurman <[email protected]>
Signed-off-by: ahao-anyscale <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Lucas Kabela <[email protected]>
Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Andy Xie <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Qiliang Cui <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: li2haipeng <[email protected]>
Signed-off-by: Haipeng Li <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Zhang Xiangze <[email protected]>
Signed-off-by: zhewenli <[email protected]>
Signed-off-by: wangyafeng <[email protected]>
Signed-off-by: Yu Gong <[email protected]>
Signed-off-by: Jerry Zhang <[email protected]>
Signed-off-by: Zhuohan Li <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Faqin Zhong <[email protected]>
Signed-off-by: Vadim Gimpelson <[email protected]>
Signed-off-by: yuantao <[email protected]>
Signed-off-by: yt0428 <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: Nadav Kluger <[email protected]>
Signed-off-by: Chenheli Hua <[email protected]>
Signed-off-by: Aleksandr Malyshev <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Qiu <[email protected]>
Signed-off-by: KuntaiDu <[email protected]>
Signed-off-by: minatoaquaMK2 <[email protected]>
Signed-off-by: Amir Klein <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: frost-intel <[email protected]>
Signed-off-by: wangzhengtao <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: ilmarkov <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Alexei V. Ivanov <[email protected]>
Signed-off-by: Jiaju Zhang <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: walter beller-morales <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Samuel Shen <[email protected]>
Signed-off-by: PaulZhang12 <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: windsonsea <[email protected]>
Signed-off-by: Rehan Khan <[email protected]>
Signed-off-by: Richard Zou <[email protected]>
Signed-off-by: atalhens <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: fangyuchu <[email protected]>
Co-authored-by: a798347923 <[email protected]>
Co-authored-by: w00689259 <[email protected]>
Co-authored-by: fangyuchu <[email protected]>
Co-authored-by: TianZhuo <[email protected]>
Co-authored-by: a798347923 <[email protected]>
Co-authored-by: Sungyoon Jeong <[email protected]>
Co-authored-by: Chauncey <[email protected]>
Co-authored-by: Thomas Parnell <[email protected]>
Co-authored-by: Rémi Delacourt <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Co-authored-by: Kunshang Ji <[email protected]>
Co-authored-by: Misha Efimov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: zhang-prog <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: gnovack <[email protected]>
Co-authored-by: pwschuurman <[email protected]>
Co-authored-by: ahao-anyscale <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Aurick Qiao <[email protected]>
Co-authored-by: Aurick Qiao <[email protected]>
Co-authored-by: Sophie du Couédic <[email protected]>
Co-authored-by: Lucas Kabela <[email protected]>
Co-authored-by: Matthew Bonanni <[email protected]>
Co-authored-by: Ning Xie <[email protected]>
Co-authored-by: Hank_ <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: QiliangCui <[email protected]>
Co-authored-by: vllmellm <[email protected]>
Co-authored-by: li2haipeng <[email protected]>
Co-authored-by: liuzhenwei <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Wentao Ye <[email protected]>
Co-authored-by: xiangze-arm <[email protected]>
Co-authored-by: Zhewen Li <[email protected]>
Co-authored-by: CSWYF3634076 <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: yugong333 <[email protected]>
Co-authored-by: Jerry Zhang <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: tomeras91 <[email protected]>
Co-authored-by: bnellnm <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: lyrisz <[email protected]>
Co-authored-by: Faqin Zhong <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Vadim Gimpelson <[email protected]>
Co-authored-by: yt0428 <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Pleaplusone <[email protected]>
Co-authored-by: nadavkluger <[email protected]>
Co-authored-by: Chenheli Hua <[email protected]>
Co-authored-by: Aleksandr Malyshev <[email protected]>
Co-authored-by: Aleksandr Malyshev <[email protected]>
Co-authored-by: tou <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Qiu <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Co-authored-by: Eric Yue <[email protected]>
Co-authored-by: amirkl94 <[email protected]>
Co-authored-by: Boyuan Feng <[email protected]>
Co-authored-by: Frost Mitchell <[email protected]>
Co-authored-by: bigmoyan <[email protected]>
Co-authored-by: wangzhengtao <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Ilya Markov <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]>
Co-authored-by: wuhuikx <[email protected]>
Co-authored-by: Jiaju Zhang <[email protected]>
Co-authored-by: Jiangyun Zhu <[email protected]>
Co-authored-by: Walter Beller-Morales <[email protected]>
Co-authored-by: gmagogsfm <[email protected]>
Co-authored-by: Samuel Shen <[email protected]>
Co-authored-by: Samuel Shen <[email protected]>
Co-authored-by: Yihua Cheng <[email protected]>
Co-authored-by: Paul Zhang <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Co-authored-by: Michael Yao <[email protected]>
Co-authored-by: R3hankhan <[email protected]>
Co-authored-by: Richard Zou <[email protected]>
Co-authored-by: Snehlata <[email protected]>
Co-authored-by: Dayeol Lee <[email protected]>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants