Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Conversation

@dtrifiro
Copy link

@dtrifiro dtrifiro commented May 22, 2024

use uv to install dependencies:

As an example, for locally cached dependencies (i.e. no download):

  • pip install -r requirements-cpu.txt -r requirements-build.txt -U 28.60s user 1.54s system 41% cpu 1:12.22 total
  • uv pip install -r requirements-cpu.txt -r requirements-build.txt -U 0.18s user 0.36s system 21% cpu 2.470 total

a 30x speedup!

@openshift-ci openshift-ci bot requested review from rpancham and terrytangyuan May 22, 2024 10:58
@openshift-ci

This comment was marked as outdated.

@dtrifiro dtrifiro marked this pull request as draft May 22, 2024 10:59
@dtrifiro

This comment was marked as outdated.

z103cb

This comment was marked as outdated.

@z103cb z103cb requested review from Xaenalt and heyselbi and removed request for rpancham and terrytangyuan May 23, 2024 08:53
@dtrifiro

This comment was marked as outdated.

@dtrifiro dtrifiro force-pushed the speed-up-build branch 2 times, most recently from ee3e5fb to be6db7c Compare May 23, 2024 15:32
@dtrifiro

This comment was marked as outdated.

@dtrifiro

This comment was marked as outdated.

@dtrifiro

This comment was marked as outdated.

@dtrifiro

This comment was marked as outdated.

@dtrifiro dtrifiro changed the base branch from ibm_main to main July 31, 2024 16:51
@dtrifiro dtrifiro marked this pull request as ready for review July 31, 2024 16:55
@openshift-ci openshift-ci bot requested a review from maxdebayser July 31, 2024 16:55
@dtrifiro
Copy link
Author

This is odd, the smoke test failed because the server failed to come up with this exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 34, in <module>
    from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 28, in <module>
    from vllm.model_executor.guided_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/__init__.py", line 6, in <module>
    from vllm.model_executor.guided_decoding.lm_format_enforcer_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py", line 15, in <module>
    from vllm.model_executor.guided_decoding.outlines_decoding import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 13, in <module>
    from vllm.model_executor.guided_decoding.outlines_logits_processors import (
  File "/opt/vllm/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 24, in <module>
    from outlines.caching import cache
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/__init__.py", line 2, in <module>
    import outlines.generate
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/__init__.py", line 2, in <module>
    from .cfg import cfg
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/generate/cfg.py", line 3, in <module>
    from outlines.fsm.guide import CFGGuide
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/guide.py", line 9, in <module>
    from outlines.fsm.regex import (
  File "/opt/vllm/lib64/python3.11/site-packages/outlines/fsm/regex.py", line 113, in <module>
    @numba.njit(cache=True)
     ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/decorators.py", line 225, in wrapper
    disp.enable_caching()
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/dispatcher.py", line 808, in enable_caching
    self._cache = FunctionCache(self.py_func)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line [60](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A60)1, in __init__
    self._impl = self._impl_class(py_func)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm/lib64/python3.11/site-packages/numba/core/caching.py", line 337, in __init__
    raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'create_fsm_info': no locator available for file '/opt/vllm/lib[64](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_vllm/33/pull-ci-opendatahub-io-vllm-main-smoke-test/1818691854728695808#1:build-log.txt%3A64)/python3.11/site-packages/outlines/fsm/regex.py'

but it looks to me as if nothing that could cause this. This is the diff for the environment created by this PR and the latest nightly:

diff --git a/deps-uv.txt b/deps-pip.txt
index 7c0e7b9..28c4653 100644
--- a/deps-uv.txt
+++ b/deps-only_pip.txt
@@ -1,4 +1,4 @@
-aiohappyeyeballs==2.3.3
+aiohappyeyeballs==2.3.2
 aiohttp==3.10.0
 aiosignal==1.3.1
 annotated-types==0.7.0
@@ -11,23 +11,23 @@ charset-normalizer==3.3.2
 click==8.1.7
 cloudpickle==3.0.0
 cmake==3.30.1
-datasets==2.14.4
-dill==0.3.7
+datasets==2.20.0
+dill==0.3.8
 diskcache==5.6.3
 distro==1.9.0
 dnspython==2.6.1
-email-validator==2.2.0
+email_validator==2.2.0
 fastapi==0.111.1
 fastapi-cli==0.0.4
 filelock==3.15.4
 frozenlist==1.4.1
-fsspec==2024.6.1
+fsspec==2024.5.0
 h11==0.14.0
 hiredis==3.0.0
 httpcore==1.0.5
 httptools==0.6.1
 httpx==0.27.0
-huggingface-hub==0.24.5
+huggingface-hub==0.24.3
 idna==3.7
 interegular==0.3.3
 jinja2==3.1.4
@@ -44,7 +44,7 @@ mdurl==0.1.2
 mpmath==1.3.0
 msgpack==1.0.8
 multidict==6.0.5
-multiprocess==0.70.15
+multiprocess==0.70.16
 nest-asyncio==1.6.0
 networkx==3.3
 ninja==1.11.1.1
@@ -70,11 +70,12 @@ pandas==2.2.2
 pillow==10.4.0
 prometheus-client==0.20.0
 prometheus-fastapi-instrumentator==7.0.0
-protobuf==5.27.3
+protobuf==5.27.2
 psutil==6.0.0
 py-cpuinfo==9.0.0
 pyairports==2.1.1
 pyarrow==17.0.0
+pyarrow-hotfix==0.6
 pycountry==24.6.1
 pydantic==2.8.2
 pydantic-core==2.20.1
@@ -114,7 +115,7 @@ tzdata==2024.1
 urllib3==2.2.2
 uvicorn==0.30.3
 uvloop==0.19.0
-vllm==0.5.3.post1+cu124 (from file:///workspace/dist/vllm-0.5.3.post1+cu124-cp311-cp311-linux_x86_64.whl)
+vllm==0.5.3.post1+cu124
 vllm-flash-attn==2.5.9.post1
 watchfiles==0.22.0
 websockets==12.0

plus these packages installed as part of the adapter layer:

deprecated==1.2.14
googleapis-common-protos==1.63.2
grpcio==1.62.2
grpcio-health-checking==1.62.2
grpcio-reflection==1.62.2
hf-transfer==0.1.6
importlib-metadata==8.0.0
opentelemetry-api==1.26.0
opentelemetry-exporter-otlp==1.26.0
opentelemetry-exporter-otlp-proto-common==1.26.0
opentelemetry-exporter-otlp-proto-grpc==1.26.0
opentelemetry-exporter-otlp-proto-http==1.26.0
opentelemetry-proto==1.26.0
opentelemetry-sdk==1.26.0
opentelemetry-semantic-conventions==0.47b0
opentelemetry-semantic-conventions-ai==0.4.0
protobuf==5.27.3
protobuf==4.25.4
vllm-tgis-adapter==0.2.3
wrapt==1.16.0
zipp==3.19.2

Small differences that could be accounted by the difference in build time (~12h?)

@dtrifiro
Copy link
Author

/test smoke-test

@dtrifiro
Copy link
Author

Potential issues we might hit with specific torch versions: astral-sh/uv#4303

@openshift-ci
Copy link

openshift-ci bot commented Jul 31, 2024

@dtrifiro: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/smoke-test 5cb3a9c link true /test smoke-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dtrifiro
Copy link
Author

This is now included in #127

@dtrifiro dtrifiro closed this Aug 13, 2024
@dtrifiro dtrifiro deleted the speed-up-build branch August 13, 2024 11:50
Xaenalt pushed a commit that referenced this pull request Sep 18, 2024
* Bucketing/Warmup WIP

* Cleanup

* Revert "Fix model_output_idx on HPU (#27)"

This reverts commit 90dfa92.

* Rework selected_token_indices fix to also work with block_size padding

* Simple prompt attention POC

* Remove cumsum

* MQA/GQA support for simple prompt_attention

* Cleanup

* Fix typo

* Restore profiling runs
prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024
…ernel tuning script for rocm.

Merge pull request opendatahub-io#33  - tuned moe configs v2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants