Dockerfile.ubi: speed up build #33

dtrifiro · 2024-05-22T10:58:42Z

use uv to install dependencies:

As an example, for locally cached dependencies (i.e. no download):

pip install -r requirements-cpu.txt -r requirements-build.txt -U 28.60s user 1.54s system 41% cpu 1:12.22 total
uv pip install -r requirements-cpu.txt -r requirements-build.txt -U 0.18s user 0.36s system 21% cpu 2.470 total

a 30x speedup!

Fixed upstream in vllm-project#5091

…hash

Signed-off-by: Nathan Weinberg <[email protected]>

… and upstream"

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

it's faster

dtrifiro · 2024-07-31T18:46:48Z

This is odd, the smoke test failed because the server failed to come up with this exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 34, in <module>
    from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 28, in <module>
    from vllm.model_executor.guided_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/__init__.py", line 6, in <module>
    from vllm.model_executor.guided_decoding.lm_format_enforcer_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py", line 15, in <module>
    from vllm.model_executor.guided_decoding.outlines_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 13, in <module>
    from vllm.model_executor.guided_decoding.outlines_logits_processors import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 24, in <module>
    from outlines.caching import cache
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/__init__.py", line 2, in <module>
    import outlines.generate
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/__init__.py", line 2, in <module>
    from .cfg import cfg
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/cfg.py", line 3, in <module>
    from outlines.fsm.guide import CFGGuide
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/guide.py", line 9, in <module>
    from outlines.fsm.regex import (
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/regex.py", line 113, in <module>
    @numba.njit(cache=True)
     ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/decorators.py", line 225, in wrapper
    disp.enable_caching()
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/dispatcher.py", line 808, in enable_caching
    self._cache = FunctionCache(self.py_func)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line [60](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A60)1, in __init__
    self._impl = self._impl_class(py_func)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line 337, in __init__
    raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'create_fsm_info': no locator available for file '/opt/vllm/lib[64](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A64)/python3.11/site-packages/outlines/fsm/regex.py'

but it looks to me as if nothing that could cause this. This is the diff for the environment created by this PR and the latest nightly:

diff --git a/deps-uv.txt b/deps-pip.txt
index 7c0e7b9..28c4653 100644
--- a/deps-uv.txt
+++ b/deps-only_pip.txt
@@ -1,4 +1,4 @@
-aiohappyeyeballs==2.3.3
+aiohappyeyeballs==2.3.2
 aiohttp==3.10.0
 aiosignal==1.3.1
 annotated-types==0.7.0
@@ -11,23 +11,23 @@ charset-normalizer==3.3.2
 click==8.1.7
 cloudpickle==3.0.0
 cmake==3.30.1
-datasets==2.14.4
-dill==0.3.7
+datasets==2.20.0
+dill==0.3.8
 diskcache==5.6.3
 distro==1.9.0
 dnspython==2.6.1
-email-validator==2.2.0
+email_validator==2.2.0
 fastapi==0.111.1
 fastapi-cli==0.0.4
 filelock==3.15.4
 frozenlist==1.4.1
-fsspec==2024.6.1
+fsspec==2024.5.0
 h11==0.14.0
 hiredis==3.0.0
 httpcore==1.0.5
 httptools==0.6.1
 httpx==0.27.0
-huggingface-hub==0.24.5
+huggingface-hub==0.24.3
 idna==3.7
 interegular==0.3.3
 jinja2==3.1.4
@@ -44,7 +44,7 @@ mdurl==0.1.2
 mpmath==1.3.0
 msgpack==1.0.8
 multidict==6.0.5
-multiprocess==0.70.15
+multiprocess==0.70.16
 nest-asyncio==1.6.0
 networkx==3.3
 ninja==1.11.1.1
@@ -70,11 +70,12 @@ pandas==2.2.2
 pillow==10.4.0
 prometheus-client==0.20.0
 prometheus-fastapi-instrumentator==7.0.0
-protobuf==5.27.3
+protobuf==5.27.2
 psutil==6.0.0
 py-cpuinfo==9.0.0
 pyairports==2.1.1
 pyarrow==17.0.0
+pyarrow-hotfix==0.6
 pycountry==24.6.1
 pydantic==2.8.2
 pydantic-core==2.20.1
@@ -114,7 +115,7 @@ tzdata==2024.1
 urllib3==2.2.2
 uvicorn==0.30.3
 uvloop==0.19.0
-vllm==0.5.3.post1+cu124 (from file:///workspace/dist/vllm-0.5.3.post1+cu124-cp311-cp311-linux_x86_64.whl)
+vllm==0.5.3.post1+cu124
 vllm-flash-attn==2.5.9.post1
 watchfiles==0.22.0
 websockets==12.0

plus these packages installed as part of the adapter layer:

deprecated==1.2.14
googleapis-common-protos==1.63.2
grpcio==1.62.2
grpcio-health-checking==1.62.2
grpcio-reflection==1.62.2
hf-transfer==0.1.6
importlib-metadata==8.0.0
opentelemetry-api==1.26.0
opentelemetry-exporter-otlp==1.26.0
opentelemetry-exporter-otlp-proto-common==1.26.0
opentelemetry-exporter-otlp-proto-grpc==1.26.0
opentelemetry-exporter-otlp-proto-http==1.26.0
opentelemetry-proto==1.26.0
opentelemetry-sdk==1.26.0
opentelemetry-semantic-conventions==0.47b0
opentelemetry-semantic-conventions-ai==0.4.0
protobuf==5.27.3
protobuf==4.25.4
vllm-tgis-adapter==0.2.3
wrapt==1.16.0
zipp==3.19.2

Small differences that could be accounted by the difference in build time (~12h?)

dtrifiro · 2024-07-31T19:21:15Z

/test smoke-test

dtrifiro · 2024-07-31T19:43:09Z

Potential issues we might hit with specific torch versions: astral-sh/uv#4303

openshift-ci · 2024-07-31T20:30:40Z

@dtrifiro: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/smoke-test	`5cb3a9c`	link	true	`/test smoke-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dtrifiro · 2024-08-13T11:50:28Z

This is now included in #127

* Bucketing/Warmup WIP * Cleanup * Revert "Fix model_output_idx on HPU (#27)" This reverts commit 90dfa92. * Rework selected_token_indices fix to also work with block_size padding * Simple prompt attention POC * Remove cumsum * MQA/GQA support for simple prompt_attention * Cleanup * Fix typo * Restore profiling runs

…ernel tuning script for rocm. Merge pull request opendatahub-io#33 - tuned moe configs v2

openshift-ci bot requested review from rpancham and terrytangyuan May 22, 2024 10:58