Adopt dynamo cache size to current layer definition #737

anko-intel · 2025-01-24T08:16:06Z

Adopt setting dynamo cache size to current behavior where number of graphs in compilation in one forward path is equal:
number of LlamaDecoderLayer's + 2 (RMSNorm, VocabParallelEmbedding)

It is other approach to prepare hot fix after performance regression introduced by vllm-project#11967.
The hot fix #709 restores previous performance results for torch compile mode.
This one partially recover throughput but with big cost of warmup time - as after it much more graphs are compiled during warmup.
Without increasing the cache size torch.compile reach the limits and goes in eager mode which gives low throughput .

Adopt setting dynamo cache size to current behavior where number of graphs in compilation in one forward path is equal: number of LlamaDecoderLayer's + 2 (RMSNorm, VocabParallelEmbedding)

anko-intel · 2025-02-04T07:59:24Z

after #753 not needed in this version

Set dynamo cache size for torch compile

f78b021

Adopt setting dynamo cache size to current behavior where number of graphs in compilation in one forward path is equal: number of LlamaDecoderLayer's + 2 (RMSNorm, VocabParallelEmbedding)

anko-intel requested review from afierka-intel, kzawora-intel, madamczyk-intel, mgawarkiewicz, michalkuligowski and vivekgoe as code owners January 24, 2025 08:16

anko-intel changed the title ~~Set dynamo cache size for torch compile~~ Adopt dynamo cache size to current layer definition Jan 24, 2025

madamczyk-intel approved these changes Feb 4, 2025

View reviewed changes

anko-intel closed this Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adopt dynamo cache size to current layer definition #737

Adopt dynamo cache size to current layer definition #737

Uh oh!

anko-intel commented Jan 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

anko-intel commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adopt dynamo cache size to current layer definition #737

Adopt dynamo cache size to current layer definition #737

Uh oh!

Conversation

anko-intel commented Jan 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anko-intel commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anko-intel commented Jan 24, 2025 •

edited by github-actions bot

Loading