[torch.compile] reorganize the cache directory to support compiling multiple models #19064

youkaichao · 2025-06-03T07:49:48Z

#17211 explores the possibility of compiling multiple models in the same process, i.e. both the main model and the eagle head model. However, it does this by extending the compilation cache directory in a tricky way. In addition, that integration can be problematic when we compile for specific shapes, the env vars are being set multiple times, with the latter overriding the previous value:

vllm/vllm/compilation/compiler_interface.py

Line 254 in d32aa2e

os.environ["TORCHINDUCTOR_CACHE_DIR"] = inductor_cache

This PR re-organizes the cache directory structure, so that the same vLLM instances will use the same TORCHINDUCTOR_CACHE_DIR and TRITON_CACHE_DIR, but just different storage for vllm_compile_cache.py etc.

It also reads the prefix automatically, and I think this would be helpful for future vision encoder compilation.

the current structure after running examples/offline_inference/eagle.py:

~/.cache/vllm/torch_compile_cache/762970b379/rank_0_0
  - inductor_cache
  - triton_cache
  - backbone
    - computation_graph.py
    - transformed_code.py
    - vllm_compile_cache.py
  - eagle_head
    - computation_graph.py
    - transformed_code.py
    - vllm_compile_cache.py

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2025-06-03T07:50:09Z

cc @zou3519 @houseroad

github-actions · 2025-06-03T07:59:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/compilation/backends.py

Signed-off-by: youkaichao <[email protected]>

houseroad

Overall, the idea is pretty neat. Left two more comments.

Also please ensure the cache can be still loaded appropriately. :-)

houseroad · 2025-06-03T10:59:27Z

vllm/compilation/compiler_interface.py

+    def initialize_cache(self,
+                         cache_dir: str,
+                         disable_cache: bool = False,
+                         prefix: str = ""):


since prefix is only used to caculate the base_cache_dir, why not use pass in the base_cache_dir instead of passing in prefix?

houseroad · 2025-06-03T11:01:19Z

vllm/config.py

-def set_current_vllm_config(vllm_config: VllmConfig, check_compile=False):
+def set_current_vllm_config(vllm_config: VllmConfig,
+                            check_compile=False,
+                            prefix: Optional[str] = None):


add a bit comment to explain the prefix meaning?

zou3519 · 2025-06-03T12:30:23Z

vllm/v1/spec_decode/eagle.py

-        self.model = get_model(vllm_config=self.vllm_config,
-                               model_config=draft_model_config)
+        from vllm.compilation.backends import set_model_tag
+        with set_model_tag("eagle_head"):


nit: could we name this something like "set_compile_region" or "set_model_component" (see the other comment)? That would make it clearer that this is 1:1 with a fullgraph torch.compile region

zou3519 · 2025-06-03T12:33:18Z

vllm/compilation/backends.py

+    def initialize_cache(self,
+                         cache_dir: str,
+                         disable_cache: bool = False,
+                         prefix: str = ""):


nit: This is technically a subdirectory (or a suffix to the path), not a prefix. I was prototyping something like this locally and I called this the "model_component", but up to you

zou3519

thank you!

luyuzhe111 · 2025-06-03T18:33:05Z

vllm/compilation/backends.py

Hi @youkaichao, I wonder if we can simply the def configure_post_pass(self) method here? I had to make some edits to make things work here but maybe they are not necessary anymore? Thanks!

vadiklyutiy · 2025-06-07T07:32:34Z

Why did you decide to keep inductor_cache and triton_cache on upper level, not inside backbone and eagle_head? Current organization introduce additional base_cache_dir and make code a little bit more complicated.

vadiklyutiy · 2025-06-07T11:55:37Z

May we pass model_tag via argument of support_torch_compile? It might be a little bit better than use global.

youkaichao added 5 commits June 3, 2025 11:52

add prefix to cache dir

b79cdf8

Signed-off-by: youkaichao <[email protected]>

add set_model_tag

26e67bc

Signed-off-by: youkaichao <[email protected]>

use base cache dir

02d6316

Signed-off-by: youkaichao <[email protected]>

use base cache dir

423f511

Signed-off-by: youkaichao <[email protected]>

add comments

84cbdf2

Signed-off-by: youkaichao <[email protected]>

youkaichao requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners June 3, 2025 07:49

mergify bot added the v1 label Jun 3, 2025

youkaichao mentioned this pull request Jun 3, 2025

Rename eagle cache dir #19027

Closed

houseroad reviewed Jun 3, 2025

View reviewed changes

vllm/compilation/backends.py Show resolved Hide resolved

add sanity check

91cf481

Signed-off-by: youkaichao <[email protected]>

houseroad approved these changes Jun 3, 2025

View reviewed changes

zou3519 reviewed Jun 3, 2025

View reviewed changes

zou3519 approved these changes Jun 3, 2025

View reviewed changes

luyuzhe111 reviewed Jun 3, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025

houseroad merged commit d70bc7c into vllm-project:main Jun 13, 2025
73 checks passed

sarckk mentioned this pull request Jul 16, 2025

Add test case for compiling multiple graphs #21044

Merged

4 tasks

Uh oh!

[torch.compile] reorganize the cache directory to support compiling multiple models #19064

[torch.compile] reorganize the cache directory to support compiling multiple models #19064

Uh oh!

Conversation

youkaichao commented Jun 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy commented Jun 7, 2025

Uh oh!

vadiklyutiy commented Jun 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

youkaichao commented Jun 3, 2025 •

edited by github-actions bot

Loading

zou3519 Jun 3, 2025 •

edited

Loading

luyuzhe111 Jun 3, 2025 •

edited

Loading