[torch.compile] add a flag to disable custom op #8488
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
our final goal is to remove these custom op except for the attention op.
many custom ops are just manual fusions. and we expect torch.compile to do a better job.
however, currently torch.compile will cost more memory.
here we put a new flag to test the behavior.
current
VLLM_TEST_DYNAMO_GRAPH_CAPTURE=0 pytest -v -s tests/compile/test_full_graph.pyis the current default behavior, don't use torch.compile .compile + custom op
pytest -v -s tests/compile/test_full_graph.pyis to test torch.compile with vllm custom op:compile without custom op
VLLM_TEST_COMPILE_NO_CUSTOM_OPS=1 pytest -v -s tests/compile/test_full_graph.pyis to test torch.compile without vllm custom op:summary
with compile, we lost 124 blocks.
without custom ops, we lost 286 blocks.
note: 2048 cpu blocks translate into 4 GB cpu memory. so every block is 2 MB memory.