[Hardware][IBM Z] Optimize s390x Dockerfile #28023

R3hankhan123 · 2025-11-04T05:53:12Z

Purpose

This PR optimizes the s390x Dockerfile to improve build efficiency and maintainability:

Simplified compiler toolchain: Replaced individual GCC packages with gcc-toolset-14 for a more streamlined installation
Updated PyTorch: Removed custom PyTorch 2.7.0 build stage and switched to official PyTorch 2.8.0 CPU wheels
Updated torchvision: Upgraded from v0.20.1to v0.23.0 to match PyTorch 2.8.0
Removed AWS-LC-sys patches: Eliminated patch for aws-lc as fix has gone into the repo
Dynamic version detection: Replaced hardcoded version strings with automatic extraction from requirements files (e.g., outlines_core version now reads from requirements/common.txt)
Extended platform support: Added s390x to the platform conditions for llguidance and xgrammar in requirements/common.txt, enabling these structured output dependencies on s390x architecture

Test Plan

Build the s390x Docker image: docker build -f docker/Dockerfile.s390x -t vllm-cpu-s390x .
Verify all build stages complete successfully
Run the container and test basic vLLM functionality

Test Result

[root@b314lp50 ~]# podman run -it --rm   -p 8000:8000   --name vllm-gpt2   --entrypoint vllm   quay.io/r3hankhan/vllm:torch-2.8.0   serve gpt2  --host 0.0.0.0 --port 8000
INFO 11-04 05:42:44 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=1) INFO 11-04 05:42:50 [api_server.py:1952] vLLM API server version 0.11.1rc6.dev58+g7f4bdadb9.d20251103
(APIServer pid=1) INFO 11-04 05:42:50 [utils.py:253] non-default args: {'model_tag': 'gpt2', 'host': '0.0.0.0', 'model': 'gpt2', 'dtype': 'bfloat16'}
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 6.30MB/s]
(APIServer pid=1) INFO 11-04 05:43:04 [model.py:657] Resolved architecture: GPT2LMHeadModel
(APIServer pid=1) INFO 11-04 05:43:05 [model.py:1975] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=1) INFO 11-04 05:43:05 [model.py:1752] Using max model len 1024
(APIServer pid=1) WARNING 11-04 05:43:05 [cpu.py:160] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
(APIServer pid=1) INFO 11-04 05:43:05 [arg_utils.py:1358] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
(APIServer pid=1) INFO 11-04 05:43:05 [arg_utils.py:1364] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 266kB/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 3.69MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 2.53MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 4.85MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 1.32MB/s]
INFO 11-04 05:43:15 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_DP0 pid=21) INFO 11-04 05:43:21 [core.py:93] Initializing a V1 LLM engine (v0.11.1rc6.dev58+g7f4bdadb9.d20251103) with config: model='gpt2', speculative_config=None, tokenizer='gpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cpu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=gpt2, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={'level': None, 'mode': 2, 'debug_dump_path': None, 'cache_dir': '', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': None, 'use_inductor': None, 'compile_sizes': None, 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'dce': True, 'size_asserts': False, 'nan_asserts': False, 'epilogue_fusion': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'use_cudagraph': True, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'full_cuda_graph': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'local_cache_dir': None}
(EngineCore_DP0 pid=21) WARNING 11-04 05:43:22 [cpu.py:404] Pin memory is not supported on CPU.
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:164] auto thread-binding list (id, physical core): [(0, 0), (1, 0), (2, 1), (3, 1), (8, 4), (9, 4), (10, 5), (11, 5), (16, 8), (17, 8), (18, 9), (19, 9)]
[W1104 05:43:23.044149155 utils.cpp:57] Warning: numa_migrate_pages failed. errno: 1 (function init_cpu_threads_env)
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] OMP threads binding of Process 21:
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 21, core 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 33, core 1
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 34, core 2
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 35, core 3
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 36, core 8
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 37, core 9
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 38, core 10
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 39, core 11
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 40, core 16
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 41, core 17
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 42, core 18
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 	OMP tid: 43, core 19
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_worker.py:70] 
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu_model_runner.py:67] Starting to load model gpt2...
(EngineCore_DP0 pid=21) INFO 11-04 05:43:23 [cpu.py:147] Using Torch SDPA backend.
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:13<00:00, 41.7MB/s]
(EngineCore_DP0 pid=21) INFO 11-04 05:43:37 [weight_utils.py:440] Time spent downloading weights for gpt2: 13.691614 seconds
(EngineCore_DP0 pid=21) INFO 11-04 05:43:37 [weight_utils.py:480] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.08s/it]
(EngineCore_DP0 pid=21) 
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [default_loader.py:314] Loading weights took 1.09 seconds
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [kv_cache_utils.py:1229] GPU KV cache size: 116,496 tokens
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [kv_cache_utils.py:1234] Maximum concurrency for 1,024 tokens per request: 113.77x
(EngineCore_DP0 pid=21) INFO 11-04 05:43:38 [cpu_model_runner.py:77] Warming up model for the compilation...
(EngineCore_DP0 pid=21) INFO 11-04 05:44:01 [cpu_model_runner.py:87] Warming up done.
(EngineCore_DP0 pid=21) INFO 11-04 05:44:01 [core.py:258] init engine (profile, create kv cache, warmup model) took 22.86 seconds
(EngineCore_DP0 pid=21) WARNING 11-04 05:44:05 [cpu.py:160] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
(APIServer pid=1) INFO 11-04 05:44:05 [api_server.py:1717] Supported tasks: ['generate']
(APIServer pid=1) INFO 11-04 05:44:06 [api_server.py:2021] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:38] Available routes are:
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 11-04 05:44:06 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) INFO 11-04 05:44:16 [loggers.py:215] Engine 000: Avg prompt throughput: 0.3 tokens/s, Avg generation throughput: 2.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     10.88.0.1:51662 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 11-04 05:44:26 [loggers.py:215] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 11-04 05:44:36 [loggers.py:215] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

inference result

[root@b314lp50 ~]# curl http://localhost:8000/v1/completions   -H "Content-Type: application/json"   -d '{
    "model": "gpt2",
    "prompt": "Once upon a time",
    "max_tokens": 50,
    "temperature": 0.7
  }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   742  100   637  100   105     59      9  0:00:11  0:00:10  0:00:01   138
{
  "id": "cmpl-46bf74eae1734c67af590d166fe9abab",
  "object": "text_completion",
  "created": 1762235050,
  "model": "gpt2",
  "choices": [
    {
      "index": 0,
      "text": ", the blue paint with the lights a set the head, the state the government agencies, the pastor turned out of black people have been with cars to protect your Web.\n\"\n11: 1.5)\n\" the Eagles fans, to",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "token_ids": null,
      "prompt_logprobs": null,
      "prompt_token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 54,
    "completion_tokens": 50,
    "prompt_tokens_details": null
  },
  "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector · 2025-11-04T05:55:53Z

💡 Codex Review

vllm/requirements/cpu.txt

Lines 9 to 12 in 9a2071c

    
           --extra-index-url https://download.pytorch.org/whl/cpu 
        
           torch==2.8.0+cpu; platform_machine == "x86_64" or platform_machine == "s390x" 
        
           torch==2.8.0; platform_system == "Darwin" 
        
           torch==2.8.0; platform_machine == "ppc64le" or platform_machine == "aarch64"

Torch wheels unavailable for s390x

The Dockerfile now relies on pip install torch==2.8.0+cpu for the s390x build (the torch source build stage was removed and the requirement here now includes platform_machine == "s390x"). PyTorch does not publish prebuilt wheels for the s390x architecture and also does not ship an sdist, so pip will fail with “No matching distribution found for torch==2.8.0+cpu” when this image is built on s390x. As a result the CPU Docker image can no longer be built. The custom build step needs to be retained or a wheel must be provided for s390x.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gemini-code-assist

Code Review

This pull request provides a great optimization for the s390x Dockerfile. The changes, such as switching to a pre-built PyTorch wheel and simplifying the toolchain, significantly improve build efficiency and maintainability.

I have one major observation that could lead to further significant optimization. The Dockerfile includes a numba-builder stage which builds llvm, llvmlite, and numba from source. This is a very time-consuming process. However, requirements/cpu.txt explicitly excludes numba for the s390x architecture (numba == 0.61.2; platform_machine != "s390x"). This suggests that numba is not a required dependency for s390x.

If numba is indeed not needed, removing the numba-builder stage and the corresponding installation of llvmlite and numba wheels in the final vllm-cpu stage would drastically reduce build time and image size. While I cannot comment directly on the relevant lines due to review constraints, I strongly recommend investigating this discrepancy.

Apart from this point, the rest of the changes are excellent and well-implemented.

Signed-off-by: Rehan Khan <[email protected]>

* add fault_report_addr in FaultToleranceConfig * add handle fault&get_fault_info api Signed-off-by: w00689259 <[email protected]> * remove fault_report_address in CoreEngineActorManager __init__ Signed-off-by: a798347923 <[email protected]> * ruff format Signed-off-by: a798347923 <[email protected]> * add handle fault&get_fault_info api Signed-off-by: w00689259 <[email protected]> * fix one bug. Signed-off-by: fangyuchu <[email protected]> * add fault_report_port in FaultToleranceConfig Signed-off-by: a798347923 <[email protected]> * add zmq_addr concatenate with fault_report_addr and fault_report_port Signed-off-by: a798347923 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fix some bug * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * fault reporter bug fix Signed-off-by: w00689259 <[email protected]> * remove fault_report_addr in FaultToleranceConfig Signed-off-by: a798347923 <[email protected]> * refactor: relocate method serialization functions to serial_util.py Signed-off-by: fangyuchu <[email protected]> * fix actor bug * fix actor bug * add engine_core_cmd_addr in FaultToleranceConfig Signed-off-by: a798347923 <[email protected]> * add and use _stop_worker_execution in EngineCoreGuard Signed-off-by: a798347923 <[email protected]> * add and use run in WorkerGuard Signed-off-by: a798347923 <[email protected]> * fix actor bug * fix bug * fix sentinel * fix bug vllm/v1/engine/core.py:847: error: Missing positional argument "tp_size" in call to "EngineCoreGuard" Signed-off-by: a798347923 <[email protected]> * fix bug error: Missing positional arguments "length", "byteorder" in call to "to_bytes" of "int" Signed-off-by: a798347923 <[email protected]> * fix bug in fault tolerance mode Signed-off-by: w00689259 <[email protected]> * fix bug in fault tolerance mode Signed-off-by: w00689259 <[email protected]> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <[email protected]> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <[email protected]> * add _recv_cmd func use deserialize_method_call and run_method in run func Signed-off-by: a798347923 <[email protected]> * Update core.py fix bug error: Need type annotation for "kwargs" (hint: "kwargs: dict[<type>, <type>] = ...") Signed-off-by: a798347923 <[email protected]> * add self.ctx.term() in shutdown() Signed-off-by: a798347923 <[email protected]> * changed import deserialize_method_call,serialize_method_call Signed-off-by: a798347923 <[email protected]> * changed init worker_guard in init_device Signed-off-by: a798347923 <[email protected]> * Update core.py add import serialize_method_call Signed-off-by: a798347923 <[email protected]> * Update gpu_worker.py changed init WorkerGuard in init_device Signed-off-by: a798347923 <[email protected]> * Update gpu_worker.py FIX BUG self.worker_guard: WorkerGuard|None = None Signed-off-by: a798347923 <[email protected]> * Update gpu_worker.py fix bug error: Argument 1 to "deserialize_method_call" has incompatible type "str | None"; expected "str" [arg-type] Signed-off-by: a798347923 <[email protected]> * Update gpu_worker.py ruff format Signed-off-by: a798347923 <[email protected]> * Update core.py ruff-format Signed-off-by: a798347923 <[email protected]> * actively send exception information Signed-off-by: w00689259 <[email protected]> * actively send exception information Signed-off-by: w00689259 <[email protected]> * actively send exception information Signed-off-by: w00689259 <[email protected]> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <[email protected]> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <[email protected]> * Update utils.py delete engine_core_cmd_addr in EngineZmqAddresses Signed-off-by: a798347923 <[email protected]> * Remove redundant configuration: fault-pub-port Signed-off-by: fangyuchu <[email protected]> * Send pause instructions after receiving fault info in ClientGuard Signed-off-by: fangyuchu <[email protected]> * change engine_core_guard_identities from dict[int, bytes] to list[bytes] Signed-off-by: a798347923 <[email protected]> * fix bug "only the worker guard of engine core 0 can receive messages sent from engine core guard Signed-off-by: a798347923 <[email protected]> * change local_rank to rank_in_group in WorkerGuard Signed-off-by: a798347923 <[email protected]> * changed del self.client_cmd_registry[int(unhealthy_engine.engine_id)] Signed-off-by: a798347923 <[email protected]> * add gloo communication timeout * fix some bug * add stateless_process_group gloo_comm_timeout * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <[email protected]> * fix some bug * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <[email protected]> * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <[email protected]> * fix return format Signed-off-by: w00689259 <[email protected]> * fix return format Signed-off-by: w00689259 <[email protected]> * fix return format Signed-off-by: w00689259 <[email protected]> * add abort request * fix some bug * fix some bug * fix some bug * add dt for client guard Signed-off-by: w00689259 <[email protected]> * add dt for client guard Signed-off-by: w00689259 <[email protected]> * add dt for client guard Signed-off-by: w00689259 <[email protected]> * Implementation of two types of pause: a soft one by using flag signals and a hard one by aborting nccl communicators. Signed-off-by: fangyuchu <[email protected]> * Refine certain log forms and fix a minor bug in pause function. Signed-off-by: fangyuchu <[email protected]> * Refactor and abstract the recv_msg logic in CG,ECG,WG. Signed-off-by: fangyuchu <[email protected]> * [Frontend] Align finish_reason when tool is called with OpenAI (vllm-project#25054) Signed-off-by: Sungyoon Jeong <[email protected]> Co-authored-by: Chauncey <[email protected]> * [Hybrid] Pass kernel block size to builders (vllm-project#27753) Signed-off-by: Thomas Parnell <[email protected]> * [Bugfix] Padded Eagle Specdec with Chunked Prefill (vllm-project#26263) Signed-off-by: Rémi Delacourt <[email protected]> Signed-off-by: Rémi Delacourt <[email protected]> Signed-off-by: remi <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> * [XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (vllm-project#27964) Signed-off-by: Kunshang Ji <[email protected]> * Add and check method uuid when sending commands and receiving results. Signed-off-by: fangyuchu <[email protected]> * Add ORCA endpoint load metrics support (vllm-project#24905) Signed-off-by: Misha Efimov <[email protected]> * [CI/Build] Remove the flaky gpt-oss lora test (vllm-project#27966) Signed-off-by: Jee Jee Li <[email protected]> * Abstract the logic of sending instructions and waiting responses from FaultHandler Signed-off-by: fangyuchu <[email protected]> * [Model] Add PaddleOCR-VL Model Support (vllm-project#27758) Signed-off-by: zhangyue <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: zhangyue66 <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Isotr0py <[email protected]> * Add options in EngineCoreGuard to recv execution results from WorkerGuard Signed-off-by: fangyuchu <[email protected]> * Early exit for MoE LoRA kernels (vllm-project#27131) Signed-off-by: gnovack <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [Bugfix] Skip gs:// model paths for speculator detection (vllm-project#27846) Signed-off-by: Peter Schuurman <[email protected]> * [BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (vllm-project#27616) Signed-off-by: ahao-anyscale <[email protected]> * [CI/Testing] Add basic single node dual batch overlap test (vllm-project#27235) Signed-off-by: Lucas Wilkinson <[email protected]> * [Spec Decode] Integrate Suffix Decoding from Arctic Inference (vllm-project#25784) Co-authored-by: Aurick Qiao <[email protected]> * [Feature][Benchmarks] Support `inf` burstiness (vllm-project#26941) Signed-off-by: Sophie du Couédic <[email protected]> * [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (vllm-project#27764) Signed-off-by: Lucas Kabela <[email protected]> * [Bugfix] change FlashMLA reorder_batch_threshold (vllm-project#27777) Signed-off-by: Matthew Bonanni <[email protected]> * [Docs] add runai_streamer_sharded to LoadConfig (vllm-project#27937) Signed-off-by: Andy Xie <[email protected]> * Add TP parameter to attention tests (vllm-project#27683) Signed-off-by: Matthew Bonanni <[email protected]> * [Bugfix][plugin] fla crash on plugin (vllm-project#27322) * [Bugfix] Fix MoE Routing Simulation (vllm-project#28002) Signed-off-by: Tyler Michael Smith <[email protected]> * Remove the tpu docker image nightly build. (vllm-project#27997) Signed-off-by: Qiliang Cui <[email protected]> * [Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (vllm-project#27748) Signed-off-by: vllmellm <[email protected]> * [LoRA] Lora shrink swizzle (vllm-project#27694) Signed-off-by: li2haipeng <[email protected]> Signed-off-by: Haipeng Li <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [Refactor] Lazy import tool_parser (vllm-project#27974) Signed-off-by: chaunceyjiang <[email protected]> * [NIXL][XPU] Pin NIXL version to 0.7.0 (vllm-project#27849) Signed-off-by: zhenwei-intel <[email protected]> * [Metrics] Enable sleep state metric outside of dev mode (vllm-project#27867) Signed-off-by: Mark McLoughlin <[email protected]> * [Bug] Batch invariant: Fix flash attn MLA `RuntimeError: scheduler_metadata must have shape (metadata_size)` (vllm-project#27884) * [CPU]Improve dynamic 4bit moe performance (vllm-project#27240) Signed-off-by: Zhang Xiangze <[email protected]> * [CI/Build] Update LM Eval Version in AMD CI (vllm-project#27944) Signed-off-by: zhewenli <[email protected]> * [KV Connector] Make KVCacheConfig an explicit constructor argument (vllm-project#27887) Signed-off-by: Mark McLoughlin <[email protected]> * [Model] fix ernie45 reasoning_parser (vllm-project#27973) Signed-off-by: wangyafeng <[email protected]> * [CI/Build] Fix OpenAI API correctness on AMD CI (vllm-project#28022) Signed-off-by: zhewenli <[email protected]> * [BugFix][Performance] Restore flashinfer autotuning for all scenarios (vllm-project#27904) * Support worker reinitialization after hard pause; add task queue in FaultHandler to ensure sequential task execution Signed-off-by: fangyuchu <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * Load tuned fused_moe_lora shrink and expand kernel configs separately (vllm-project#27435) Signed-off-by: Yu Gong <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * resolve conflicts Signed-off-by: w00689259 <[email protected]> * Support using Int4PreshuffledTensor after loading (vllm-project#26066) Signed-off-by: Jerry Zhang <[email protected]> * [Core] Enable StatLogger in LLMEngine (vllm-project#28020) Signed-off-by: Zhuohan Li <[email protected]> * [Model][Bugfix] fix pipeline parallelism support for NemotronH (vllm-project#27968) Signed-off-by: Tomer Asida <[email protected]> * [Model] add optimal triton fused moe configs for NemotronH MoE (vllm-project#27967) Signed-off-by: Tomer Asida <[email protected]> * [Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (vllm-project#27123) * [BugFix] Fix incorrect preallocated sampled_token_ids tensor size (vllm-project#28025) Signed-off-by: Nick Hill <[email protected]> * [Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (vllm-project#27284) Signed-off-by: Faqin Zhong <[email protected]> Co-authored-by: Faqin Zhong <[email protected]> Co-authored-by: Michael Goin <[email protected]> * [PERF] Decouple projections from GDN custom op (vllm-project#27512) Signed-off-by: Vadim Gimpelson <[email protected]> * [model] Add support for openPangu_Ultra_MoE (vllm-project#27521) Signed-off-by: yuantao <[email protected]> Signed-off-by: yt0428 <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [PerfFix] Avoid separate thread for MP executor shm spin (vllm-project#28012) Signed-off-by: Nick Hill <[email protected]> * [AsyncScheduling] Don't schedule past request max_tokens (vllm-project#27922) Signed-off-by: Nick Hill <[email protected]> * Remove deprecated `--rope-scaling` and `--rope-theta` (vllm-project#28006) Signed-off-by: Harry Mellor <[email protected]> * [ROCm][Perf] New design on ROCm AITER MHA backend Implementation (vllm-project#25763) Signed-off-by: ganyi <[email protected]> * Added disable rule to track files under benchmarks/lib (vllm-project#28048) Signed-off-by: Nadav Kluger <[email protected]> * [Multimodal] Make MediaConnector extensible. (vllm-project#27759) Signed-off-by: Chenheli Hua <[email protected]> * [ROCm] gemm_a16w16 upstreaming (vllm-project#26969) Signed-off-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> * Revert "[PERF] Decouple projections from GDN custom op" (vllm-project#28080) Signed-off-by: Vadim Gimpelson <[email protected]> * add engine core ut Signed-off-by: w00689259 <[email protected]> * add engine core ut Signed-off-by: w00689259 <[email protected]> * [Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (vllm-project#27740) * [XPU] Add gpt-oss model support for Intel GPU (vllm-project#27786) Signed-off-by: Kunshang Ji <[email protected]> * [CI/Build] Enable some fixed tests in AMD CI (vllm-project#28078) Signed-off-by: zhewenli <[email protected]> * [V0 deprecation] Remove VLLM_USE_V1 usage in most modules (vllm-project#27955) Signed-off-by: wangxiyuan <[email protected]> * [Bugfix] Fix encoder-only model support for transformers backend (vllm-project#28021) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Harry Mellor <[email protected]> * [BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (vllm-project#28100) Signed-off-by: Lucas Wilkinson <[email protected]> * [Model, Core] Support Granite Speech & LoRA for STT (vllm-project#24455) * [Refactor] Lazy-loaded reasoning_parser (vllm-project#28092) Signed-off-by: chaunceyjiang <[email protected]> * [Refactor] to simplify and extract the shared logic between chat completion and responses (vllm-project#27961) Signed-off-by: chaunceyjiang <[email protected]> * [bugfix] fix wrong `dcp_local_seq_lens` calc (vllm-project#27518) Signed-off-by: Qiu <[email protected]> * [Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (vllm-project#28011) Signed-off-by: KuntaiDu <[email protected]> * [Misc] fix import error for DeepSeekR1ReasoningParser (vllm-project#28114) Signed-off-by: chaunceyjiang <[email protected]> * Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (vllm-project#27635) Signed-off-by: minatoaquaMK2 <[email protected]> * Bugfix: Cutlass FP8 FusedMoE bad scaling factors (vllm-project#27255) Signed-off-by: Amir Klein <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Michael Goin <[email protected]> * [Graph Partition][Cache] Use inductor partition ops config (vllm-project#27702) Signed-off-by: Boyuan Feng <[email protected]> * [XPU] Enable custom routing functions in IPEX for Llama4 (vllm-project#28004) Signed-off-by: frost-intel <[email protected]> * add kimi reasoning parser (vllm-project#28128) Signed-off-by: wangzhengtao <[email protected]> Co-authored-by: wangzhengtao <[email protected]> * [DCP] check return_lse for all layers in dcp (vllm-project#27929) Signed-off-by: Chen Zhang <[email protected]> * [BugFix] Support EP/DP + EPLB with MTP (vllm-project#25311) Signed-off-by: ilmarkov <[email protected]> Signed-off-by: Sage Moore <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> * Enabling cooperative multi-gpu tests on multi-gpu nodes (vllm-project#27986) Signed-off-by: Alexei V. Ivanov <[email protected]> * [ROCm][MLA] Support block-size > 1 for AITER MLA backend (vllm-project#27224) Signed-off-by: ganyi <[email protected]> Co-authored-by: wuhuikx <[email protected]> * [Bugfix] Validate custom logits processor xargs for online serving (vllm-project#27560) Signed-off-by: Isotr0py <[email protected]> * [misc] add vLLM Beijing Meetup (vllm-project#28127) Signed-off-by: Jiaju Zhang <[email protected]> * [Kernel] Fuse computation of g and beta for Gated Delta Net (vllm-project#28095) Signed-off-by: zjy0516 <[email protected]> * [Core] add support for reasoning parser plugins (vllm-project#28075) Signed-off-by: walter beller-morales <[email protected]> * [Bugfix] vLLM should check Inductor config for compile cache enablement status (vllm-project#27637) Signed-off-by: Yanan Cao <[email protected]> * [FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (vllm-project#27994) Signed-off-by: Chen Zhang <[email protected]> * [CI]: Add LMCacheConnector Unit Tests (vllm-project#27852) Signed-off-by: Samuel Shen <[email protected]> Co-authored-by: Samuel Shen <[email protected]> Co-authored-by: Yihua Cheng <[email protected]> * [Feature] Extend batch invariant torch.compile to B200 (vllm-project#27856) Signed-off-by: PaulZhang12 <[email protected]> * [Bugfix] Fix Qwen3-Reranker-8B load (vllm-project#28117) Signed-off-by: wang.yuqi <[email protected]> * [Docs] Clean up README_TUNING.md (vllm-project#28088) Signed-off-by: windsonsea <[email protected]> * [Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023) Signed-off-by: Rehan Khan <[email protected]> * [Chore] Remove Nemotron-Nano-VL config copy (vllm-project#28126) Signed-off-by: Isotr0py <[email protected]> * [Docs] Add guide to debugging vLLM-torch.compile integration (vllm-project#28094) Signed-off-by: Richard Zou <[email protected]> * [Feature]: Add corrupted request metric to V1 metrics system. (vllm-project#27306) Signed-off-by: atalhens <[email protected]> * [CI/Build] Update checking logic in cutlass_group_gemm_supported (vllm-project#27948) Signed-off-by: zhewenli <[email protected]> * [CI/Build] Fix `test_defaults_with_usage_context` in AMD CI (vllm-project#27926) Signed-off-by: zhewenli <[email protected]> * [Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (vllm-project#25431) Signed-off-by: KuntaiDu <[email protected]> * [Debugging] Add annotation for easier trace analysis (vllm-project#22496) * [PERF] Decouple projections from GDN custom op. Attempt 2 (vllm-project#28083) Signed-off-by: Vadim Gimpelson <[email protected]> * [Bug] Fix cpu disable shared_experts `VLLM_DISABLE_SHARED_EXPERTS_STREAM` (vllm-project#28157) Signed-off-by: yewentao256 <[email protected]> * [Bug] Fix env string `"0"` same to `True` (vllm-project#28159) Signed-off-by: yewentao256 <[email protected]> * Ensure WorkerGuard command execution returns result; fix missing set_device when TP>1 Signed-off-by: fangyuchu <[email protected]> * [Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (vllm-project#28164) Signed-off-by: yewentao256 <[email protected]> * [CI Failure] `nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV` was removed from HF. Skip it in tests (vllm-project#28170) Signed-off-by: Vadim Gimpelson <[email protected]> * [Misc] Remove the duplicate code (vllm-project#28111) Signed-off-by: chaunceyjiang <[email protected]> * rename& format logger Signed-off-by: w00689259 <[email protected]> * rename& format logger Signed-off-by: w00689259 <[email protected]> * feat(nccl): enable non-blocking NCCL communicators to support ncclCommAbort Signed-off-by: fangyuchu <[email protected]> --------- Signed-off-by: w00689259 <[email protected]> Signed-off-by: a798347923 <[email protected]> Signed-off-by: fangyuchu <[email protected]> Signed-off-by: zWaNg3 <[email protected]> Signed-off-by: a798347923 <[email protected]> Signed-off-by: Sungyoon Jeong <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Rémi Delacourt <[email protected]> Signed-off-by: Rémi Delacourt <[email protected]> Signed-off-by: remi <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Misha Efimov <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: zhangyue <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: zhangyue66 <[email protected]> Signed-off-by: gnovack <[email protected]> Signed-off-by: Peter Schuurman <[email protected]> Signed-off-by: ahao-anyscale <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Sophie du Couédic <[email protected]> Signed-off-by: Lucas Kabela <[email protected]> Signed-off-by: Matthew Bonanni <[email protected]> Signed-off-by: Andy Xie <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Qiliang Cui <[email protected]> Signed-off-by: vllmellm <[email protected]> Signed-off-by: li2haipeng <[email protected]> Signed-off-by: Haipeng Li <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: zhenwei-intel <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Zhang Xiangze <[email protected]> Signed-off-by: zhewenli <[email protected]> Signed-off-by: wangyafeng <[email protected]> Signed-off-by: Yu Gong <[email protected]> Signed-off-by: Jerry Zhang <[email protected]> Signed-off-by: Zhuohan Li <[email protected]> Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Faqin Zhong <[email protected]> Signed-off-by: Vadim Gimpelson <[email protected]> Signed-off-by: yuantao <[email protected]> Signed-off-by: yt0428 <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: ganyi <[email protected]> Signed-off-by: Nadav Kluger <[email protected]> Signed-off-by: Chenheli Hua <[email protected]> Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Qiu <[email protected]> Signed-off-by: KuntaiDu <[email protected]> Signed-off-by: minatoaquaMK2 <[email protected]> Signed-off-by: Amir Klein <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: frost-intel <[email protected]> Signed-off-by: wangzhengtao <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: ilmarkov <[email protected]> Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Alexei V. Ivanov <[email protected]> Signed-off-by: Jiaju Zhang <[email protected]> Signed-off-by: zjy0516 <[email protected]> Signed-off-by: walter beller-morales <[email protected]> Signed-off-by: Yanan Cao <[email protected]> Signed-off-by: Samuel Shen <[email protected]> Signed-off-by: PaulZhang12 <[email protected]> Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: Rehan Khan <[email protected]> Signed-off-by: Richard Zou <[email protected]> Signed-off-by: atalhens <[email protected]> Signed-off-by: yewentao256 <[email protected]> Co-authored-by: fangyuchu <[email protected]> Co-authored-by: a798347923 <[email protected]> Co-authored-by: w00689259 <[email protected]> Co-authored-by: fangyuchu <[email protected]> Co-authored-by: TianZhuo <[email protected]> Co-authored-by: a798347923 <[email protected]> Co-authored-by: Sungyoon Jeong <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Rémi Delacourt <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: Misha Efimov <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: zhang-prog <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: gnovack <[email protected]> Co-authored-by: pwschuurman <[email protected]> Co-authored-by: ahao-anyscale <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Aurick Qiao <[email protected]> Co-authored-by: Aurick Qiao <[email protected]> Co-authored-by: Sophie du Couédic <[email protected]> Co-authored-by: Lucas Kabela <[email protected]> Co-authored-by: Matthew Bonanni <[email protected]> Co-authored-by: Ning Xie <[email protected]> Co-authored-by: Hank_ <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: QiliangCui <[email protected]> Co-authored-by: vllmellm <[email protected]> Co-authored-by: li2haipeng <[email protected]> Co-authored-by: liuzhenwei <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Co-authored-by: xiangze-arm <[email protected]> Co-authored-by: Zhewen Li <[email protected]> Co-authored-by: CSWYF3634076 <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: yugong333 <[email protected]> Co-authored-by: Jerry Zhang <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: tomeras91 <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: lyrisz <[email protected]> Co-authored-by: Faqin Zhong <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Vadim Gimpelson <[email protected]> Co-authored-by: yt0428 <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Pleaplusone <[email protected]> Co-authored-by: nadavkluger <[email protected]> Co-authored-by: Chenheli Hua <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: tou <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Qiu <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Eric Yue <[email protected]> Co-authored-by: amirkl94 <[email protected]> Co-authored-by: Boyuan Feng <[email protected]> Co-authored-by: Frost Mitchell <[email protected]> Co-authored-by: bigmoyan <[email protected]> Co-authored-by: wangzhengtao <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Ilya Markov <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: wuhuikx <[email protected]> Co-authored-by: Jiaju Zhang <[email protected]> Co-authored-by: Jiangyun Zhu <[email protected]> Co-authored-by: Walter Beller-Morales <[email protected]> Co-authored-by: gmagogsfm <[email protected]> Co-authored-by: Samuel Shen <[email protected]> Co-authored-by: Samuel Shen <[email protected]> Co-authored-by: Yihua Cheng <[email protected]> Co-authored-by: Paul Zhang <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: R3hankhan <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Snehlata <[email protected]> Co-authored-by: Dayeol Lee <[email protected]>

Signed-off-by: Rehan Khan <[email protected]>

Signed-off-by: Rehan Khan <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Rehan Khan <[email protected]>

mergify bot added the ci/build label Nov 4, 2025

R3hankhan123 force-pushed the s390x-optimzations branch from 9a2071c to 9207a6c Compare November 4, 2025 05:54

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

[Hardware][IBM Z] Optimize s390x Dockerfile

16a2bfe

Signed-off-by: Rehan Khan <[email protected]>

R3hankhan123 force-pushed the s390x-optimzations branch from 9207a6c to 16a2bfe Compare November 4, 2025 09:27

heheda12345 approved these changes Nov 5, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) November 5, 2025 07:45

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025

simon-mo disabled auto-merge November 5, 2025 19:25

simon-mo merged commit e044924 into vllm-project:main Nov 5, 2025
90 of 92 checks passed

andylolu2 pushed a commit to andylolu2/vllm that referenced this pull request Nov 5, 2025

[Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023)

584fe64

Signed-off-by: Rehan Khan <[email protected]>

R3hankhan123 deleted the s390x-optimzations branch November 6, 2025 02:26

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023)

f509510

Signed-off-by: Rehan Khan <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023)

cdc9217

Signed-off-by: Rehan Khan <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Hardware][IBM Z] Optimize s390x Dockerfile (vllm-project#28023)

e72b7c3

Signed-off-by: Rehan Khan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Hardware][IBM Z] Optimize s390x Dockerfile #28023

[Hardware][IBM Z] Optimize s390x Dockerfile #28023

R3hankhan123 commented Nov 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Hardware][IBM Z] Optimize s390x Dockerfile #28023

[Hardware][IBM Z] Optimize s390x Dockerfile #28023

Conversation

R3hankhan123 commented Nov 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Nov 4, 2025

💡 Codex Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

R3hankhan123 commented Nov 4, 2025 •

edited by github-actions bot

Loading