Skip to content
Closed
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
1a7d826
Upgrade to `torch==2.2.0`
hmellor Feb 7, 2024
7de363f
Remove `wheel` from `requirements-dev.txt`
hmellor Feb 7, 2024
9bc921d
Revert change to `Dockerfile.rocm`
hmellor Feb 12, 2024
76ab3e7
Kick CI
hmellor Feb 15, 2024
0109fd2
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Feb 15, 2024
4c616a8
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Feb 21, 2024
922aa0c
Update requirements.txt
hmellor Feb 22, 2024
193d73a
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Feb 22, 2024
bfcc926
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Feb 22, 2024
584e6ef
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Mar 4, 2024
daca4e1
Update to 2.2.1
hmellor Mar 4, 2024
015b7d4
Revert "Update to 2.2.1"
hmellor Mar 4, 2024
fef9e03
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Mar 7, 2024
cf400cb
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Mar 12, 2024
d77e855
Merge branch 'main' into pytorch-2.2.0-upgrade
hmellor Mar 15, 2024
75f05de
Update requirements.txt
hmellor Mar 15, 2024
e82cf3a
try to test one distributed at a time
youkaichao Mar 16, 2024
6d10bf5
upgrade to pytorch 2.2.0 by merging 'graphcore/pytorch-2.2.0-upgrade'
youkaichao Mar 16, 2024
4accd02
try pytorch 2.2.1
youkaichao Mar 16, 2024
a92346f
try to fix test
youkaichao Mar 21, 2024
e7f215b
use pip install to resolve the problem
youkaichao Mar 21, 2024
f99fe2a
remove nccl version to test
youkaichao Mar 21, 2024
0f3181f
move to Dockerfile
youkaichao Mar 21, 2024
6ef3843
fix version
youkaichao Mar 21, 2024
7db0e1b
use docerfile
youkaichao Mar 21, 2024
62650ae
try 2.2.0 first
youkaichao Mar 21, 2024
4ed16b9
place nccl install after vllm
youkaichao Mar 21, 2024
2d215df
patchelf
youkaichao Mar 21, 2024
0f6f243
update rpath for cupy
youkaichao Mar 21, 2024
da1df5e
try to write a custom pynccl
youkaichao Mar 22, 2024
b4085a1
add wget
youkaichao Mar 22, 2024
f77c9ae
delete logging code
youkaichao Mar 22, 2024
2766418
remove some debugging print
youkaichao Mar 22, 2024
0e18aed
use nccl 2.18.3
youkaichao Mar 23, 2024
7c531b0
add test for pynccl
youkaichao Mar 23, 2024
bbe3622
Merge remote-tracking branch 'origin' into fix_parallel_distributed_test
youkaichao Mar 23, 2024
1abf38e
fix linter
youkaichao Mar 23, 2024
5d661a6
update cupy_utils to pynccl
youkaichao Mar 23, 2024
99f96d7
rename cupy_utils to pynccl_utils
youkaichao Mar 23, 2024
b567f04
update import
youkaichao Mar 23, 2024
74fcf08
update pytorch in cmake
youkaichao Mar 23, 2024
43da101
add test with cudagraph
youkaichao Mar 23, 2024
37e7425
fix test; fix TORCH_CUDA_ARCH_LIST
youkaichao Mar 23, 2024
7e983f5
fix amd tests
youkaichao Mar 23, 2024
e3f8d5f
add pynccl test
youkaichao Mar 23, 2024
4e277ae
pack up libnccl.so
youkaichao Mar 23, 2024
a20d802
add so in setup.py, and use programatical path in pynccl
youkaichao Mar 23, 2024
dfc9d82
rename cupy --> pynccl
youkaichao Mar 23, 2024
8a5a011
rename cupy --> pynccl
youkaichao Mar 23, 2024
a009e31
rename cupy --> pynccl
youkaichao Mar 23, 2024
68e4792
rename cupy --> pynccl
youkaichao Mar 23, 2024
0a6fab1
fix wget install order
youkaichao Mar 23, 2024
a82a976
rename cupy --> pynccl
youkaichao Mar 23, 2024
1c6ec48
fix so filename and search path
youkaichao Mar 23, 2024
47ff82a
fix dockerfile
youkaichao Mar 23, 2024
b0c15c2
fix dockerfile
youkaichao Mar 23, 2024
0b4f7dd
download and use manifest in to force keeping .so file
youkaichao Mar 23, 2024
7942050
download and use manifest in to force keeping .so file
youkaichao Mar 23, 2024
20a3ec4
restore dockerfile
youkaichao Mar 23, 2024
0ca27b7
add lib file to package data
youkaichao Mar 23, 2024
a3c2340
add libnccl.so.2.18.3 via hard-coding
youkaichao Mar 23, 2024
71e2976
enable VLLM_NCCL_SO_PATH at runtime
youkaichao Mar 25, 2024
3d9332a
nit, os.makedirs(target_dir, exist_ok=True)
youkaichao Mar 25, 2024
76f46f6
upgrade to pt 2.2.1
youkaichao Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,13 @@ steps:
working_dir: "/vllm-workspace/tests/distributed"
num_gpus: 2 # only support 1 or 2 for now.

- label: Distributed Correctness Test
command: pytest -v -s --forked test_basic_distributed_correctness.py
- label: Distributed Correctness Test-facebook/opt-125m
command: TEST_DIST_MODEL=facebook/opt-125m pytest -v -s --forked test_basic_distributed_correctness.py
working_dir: "/vllm-workspace/tests/distributed"
num_gpus: 2 # only support 1 or 2 for now.

- label: Distributed Correctness Test-meta-llama/Llama-2-7b-hf
command: TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf pytest -v -s --forked test_basic_distributed_correctness.py
working_dir: "/vllm-workspace/tests/distributed"
num_gpus: 2 # only support 1 or 2 for now.

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
matrix:
os: ['ubuntu-20.04']
python-version: ['3.8', '3.9', '3.10', '3.11']
pytorch-version: ['2.1.2'] # Must be the most recent version that meets requirements.txt.
pytorch-version: ['2.2.1'] # Must be the most recent version that meets requirements.txt.
cuda-version: ['11.8', '12.1']

steps:
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ requires = [
"ninja",
"packaging",
"setuptools >= 49.4.0",
"torch == 2.1.2",
"torch == 2.2.1",
"wheel",
]
build-backend = "setuptools.build_meta"
Expand Down
2 changes: 1 addition & 1 deletion requirements-build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
ninja
packaging
setuptools>=49.4.0
torch==2.1.2
torch==2.2.1
wheel
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ psutil
ray >= 2.9
sentencepiece # Required for LLaMA tokenizer.
numpy
torch == 2.1.2
torch == 2.2.1
transformers >= 4.38.0 # Required for Gemma.
xformers == 0.0.23.post1 # Required for CUDA 12.1.
xformers == 0.0.25 # Requires PyTorch 2.2.1.
fastapi
uvicorn[standard]
pydantic >= 2.0 # Required for OpenAI server.
Expand Down
16 changes: 13 additions & 3 deletions tests/distributed/test_basic_distributed_correctness.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
"""Compare the outputs of HF and distributed vLLM when using greedy sampling.

Run `pytest tests/distributed/test_basic_distributed_correctness.py --forked`.
vLLM will allocate all the available memory, so we need to run the tests one
by one. The solution is to pass arguments (model name) by environment
variables.
Run:

```sh
TEST_DIST_MODEL=facebook/opt-125m pytest \
test_basic_distributed_correctness.py
TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf \
test_basic_distributed_correctness.py
```
"""
import os
import pytest
import torch

MODELS = [
"facebook/opt-125m",
"meta-llama/Llama-2-7b-hf",
os.environ["TEST_DIST_MODEL"],
]


Expand Down