Skip to content

Commit 14d82fc

Browse files
authored
Merge branch 'main' into fa-block-size-revert
2 parents a5829a3 + b34129b commit 14d82fc

File tree

4 files changed

+25
-6
lines changed

4 files changed

+25
-6
lines changed

docker/Dockerfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}
5656

5757
# PyTorch provides its own indexes for standard and nightly builds
5858
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl
59+
ARG PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly
5960

6061
# PIP supports multiple authentication schemes, including keyring
6162
# By parameterizing the PIP_KEYRING_PROVIDER variable and setting it to
@@ -97,6 +98,7 @@ RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \
9798
ARG PIP_INDEX_URL UV_INDEX_URL
9899
ARG PIP_EXTRA_INDEX_URL UV_EXTRA_INDEX_URL
99100
ARG PYTORCH_CUDA_INDEX_BASE_URL
101+
ARG PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL
100102
ARG PIP_KEYRING_PROVIDER UV_KEYRING_PROVIDER
101103

102104
# Activate virtual environment and add uv to PATH
@@ -315,6 +317,7 @@ RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \
315317
ARG PIP_INDEX_URL UV_INDEX_URL
316318
ARG PIP_EXTRA_INDEX_URL UV_EXTRA_INDEX_URL
317319
ARG PYTORCH_CUDA_INDEX_BASE_URL
320+
ARG PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL
318321
ARG PIP_KEYRING_PROVIDER UV_KEYRING_PROVIDER
319322

320323
# Install uv for faster pip installs
@@ -334,6 +337,20 @@ ENV UV_LINK_MODE=copy
334337
# or future versions of triton.
335338
RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/
336339

340+
# arm64 (GH200) build follows the practice of "use existing pytorch" build,
341+
# we need to install torch and torchvision from the nightly builds first,
342+
# pytorch will not appear as a vLLM dependency in all of the following steps
343+
# after this step
344+
RUN --mount=type=cache,target=/root/.cache/uv \
345+
if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
346+
uv pip install --system \
347+
--index-url ${PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') \
348+
"torch==2.8.0.dev20250318+cu128" "torchvision==0.22.0.dev20250319" ; \
349+
uv pip install --system \
350+
--index-url ${PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') \
351+
--pre pytorch_triton==3.3.0+gitab727c40 ; \
352+
fi
353+
337354
# Install vllm wheel first, so that torch etc will be installed.
338355
RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist \
339356
--mount=type=cache,target=/root/.cache/uv \

docs/deployment/docker.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,8 @@ DOCKER_BUILDKIT=1 docker build . \
8282

8383
## Building for Arm64/aarch64
8484

85-
A docker container can be built for aarch64 systems such as the Nvidia Grace-Hopper. At time of this writing, this should be considered **experimental**. Using the flag `--platform "linux/arm64"` will attempt to build for arm64.
85+
A docker container can be built for aarch64 systems such as the Nvidia Grace-Hopper. At time of this writing, this requires the use
86+
of PyTorch Nightly and should be considered **experimental**. Using the flag `--platform "linux/arm64"` will attempt to build for arm64.
8687

8788
!!! note
8889
Multiple modules must be compiled, so this process can take a while. Recommend using `--build-arg max_jobs=` & `--build-arg nvcc_threads=`
@@ -93,15 +94,15 @@ A docker container can be built for aarch64 systems such as the Nvidia Grace-Hop
9394

9495
```bash
9596
# Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
97+
python3 use_existing_torch.py
9698
DOCKER_BUILDKIT=1 docker build . \
9799
--file docker/Dockerfile \
98100
--target vllm-openai \
99101
--platform "linux/arm64" \
100102
-t vllm/vllm-gh200-openai:latest \
101103
--build-arg max_jobs=66 \
102104
--build-arg nvcc_threads=2 \
103-
--build-arg torch_cuda_arch_list="9.0 10.0+PTX" \
104-
--build-arg RUN_WHEEL_CHECK=false
105+
--build-arg torch_cuda_arch_list="9.0 10.0+PTX"
105106
```
106107

107108
!!! note

docs/getting_started/installation/gpu.cuda.inc.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,10 @@ uv pip install -e .
158158

159159
##### Use an existing PyTorch installation
160160

161-
There are scenarios where the PyTorch dependency cannot be easily installed with `uv`, for example, when building vLLM with non-default PyTorch builds (like nightly or a custom build).
161+
There are scenarios where the PyTorch dependency cannot be easily installed with `uv`, e.g.:
162+
163+
- Building vLLM with PyTorch nightly or a custom PyTorch build.
164+
- Building vLLM with aarch64 and CUDA (GH200), where the PyTorch wheels are not available on PyPI. Currently, only the PyTorch nightly has wheels for aarch64 with CUDA. You can run `uv pip install --index-url https://download.pytorch.org/whl/nightly/cu128 torch torchvision torchaudio` to [install PyTorch nightly](https://pytorch.org/get-started/locally/) and then build vLLM on top of it.
162165

163166
To build vLLM using an existing PyTorch installation:
164167

tests/v1/e2e/test_lora_with_spec_decode.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,6 @@ def test_batch_inference_correctness(
6161
model_setup: (method, model_name, spec_model_name, lora_path, tp_size)
6262
"""
6363
with monkeypatch.context() as m:
64-
m.setenv("VLLM_USE_V1", "1")
65-
6664
# Disable randomness
6765
m.setenv("CUBLAS_WORKSPACE_CONFIG", ":4096:8")
6866
torch.manual_seed(SEED)

0 commit comments

Comments
 (0)