-
Notifications
You must be signed in to change notification settings - Fork 15
Dockerfile.ubi: speed up build #33
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ee3e5fb to
be6db7c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Fixed upstream in vllm-project#5091
Signed-off-by: Nathan Weinberg <[email protected]>
|
This is odd, the smoke test failed because the server failed to come up with this exception: Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 34, in <module>
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 28, in <module>
from vllm.model_executor.guided_decoding import (
File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/__init__.py", line 6, in <module>
from vllm.model_executor.guided_decoding.lm_format_enforcer_decoding import (
File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py", line 15, in <module>
from vllm.model_executor.guided_decoding.outlines_decoding import (
File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 13, in <module>
from vllm.model_executor.guided_decoding.outlines_logits_processors import (
File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 24, in <module>
from outlines.caching import cache
File "/opt/vllm/lib64/python3.11/site-packages/outlines/__init__.py", line 2, in <module>
import outlines.generate
File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/__init__.py", line 2, in <module>
from .cfg import cfg
File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/cfg.py", line 3, in <module>
from outlines.fsm.guide import CFGGuide
File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/guide.py", line 9, in <module>
from outlines.fsm.regex import (
File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/regex.py", line 113, in <module>
@numba.njit(cache=True)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm/lib64/python3.11/site-packages/numba/core/decorators.py", line 225, in wrapper
disp.enable_caching()
File "/opt/vllm/lib64/python3.11/site-packages/numba/core/dispatcher.py", line 808, in enable_caching
self._cache = FunctionCache(self.py_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line [60](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A60)1, in __init__
self._impl = self._impl_class(py_func)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line 337, in __init__
raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'create_fsm_info': no locator available for file '/opt/vllm/lib[64](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A64)/python3.11/site-packages/outlines/fsm/regex.py'but it looks to me as if nothing that could cause this. This is the diff for the environment created by this PR and the latest nightly: diff --git a/deps-uv.txt b/deps-pip.txt
index 7c0e7b9..28c4653 100644
--- a/deps-uv.txt
+++ b/deps-only_pip.txt
@@ -1,4 +1,4 @@
-aiohappyeyeballs==2.3.3
+aiohappyeyeballs==2.3.2
aiohttp==3.10.0
aiosignal==1.3.1
annotated-types==0.7.0
@@ -11,23 +11,23 @@ charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.30.1
-datasets==2.14.4
-dill==0.3.7
+datasets==2.20.0
+dill==0.3.8
diskcache==5.6.3
distro==1.9.0
dnspython==2.6.1
-email-validator==2.2.0
+email_validator==2.2.0
fastapi==0.111.1
fastapi-cli==0.0.4
filelock==3.15.4
frozenlist==1.4.1
-fsspec==2024.6.1
+fsspec==2024.5.0
h11==0.14.0
hiredis==3.0.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
-huggingface-hub==0.24.5
+huggingface-hub==0.24.3
idna==3.7
interegular==0.3.3
jinja2==3.1.4
@@ -44,7 +44,7 @@ mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
-multiprocess==0.70.15
+multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1.1
@@ -70,11 +70,12 @@ pandas==2.2.2
pillow==10.4.0
prometheus-client==0.20.0
prometheus-fastapi-instrumentator==7.0.0
-protobuf==5.27.3
+protobuf==5.27.2
psutil==6.0.0
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==17.0.0
+pyarrow-hotfix==0.6
pycountry==24.6.1
pydantic==2.8.2
pydantic-core==2.20.1
@@ -114,7 +115,7 @@ tzdata==2024.1
urllib3==2.2.2
uvicorn==0.30.3
uvloop==0.19.0
-vllm==0.5.3.post1+cu124 (from file:///workspace/dist/vllm-0.5.3.post1+cu124-cp311-cp311-linux_x86_64.whl)
+vllm==0.5.3.post1+cu124
vllm-flash-attn==2.5.9.post1
watchfiles==0.22.0
websockets==12.0plus these packages installed as part of the adapter layer: Small differences that could be accounted by the difference in build time (~12h?) |
|
/test smoke-test |
|
Potential issues we might hit with specific torch versions: astral-sh/uv#4303 |
|
@dtrifiro: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
This is now included in #127 |
* Bucketing/Warmup WIP * Cleanup * Revert "Fix model_output_idx on HPU (#27)" This reverts commit 90dfa92. * Rework selected_token_indices fix to also work with block_size padding * Simple prompt attention POC * Remove cumsum * MQA/GQA support for simple prompt_attention * Cleanup * Fix typo * Restore profiling runs
…ernel tuning script for rocm. Merge pull request opendatahub-io#33 - tuned moe configs v2
use uv to install dependencies:
As an example, for locally cached dependencies (i.e. no download):
pip install -r requirements-cpu.txt -r requirements-build.txt -U 28.60s user 1.54s system 41% cpu 1:12.22 totaluv pip install -r requirements-cpu.txt -r requirements-build.txt -U 0.18s user 0.36s system 21% cpu 2.470 totala 30x speedup!