[gguf] Refactor __torch_function__ to avoid unnecessary computation#11551
[gguf] Refactor __torch_function__ to avoid unnecessary computation#11551DN6 merged 4 commits intohuggingface:mainfrom
Conversation
This helps with torch.compile compilation lantency. Avoiding unnecessary computation should also lead to a slightly improved eager latency.
|
cc @sayakpaul |
sayakpaul
left a comment
There was a problem hiding this comment.
Nice! Thanks for this. Do you want to also include the speedups you obtained with this patch?
|
Along with this, do we think using regional compilation (cc: huggingface/accelerate#3529) could also benefit the compilation latency? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I am going through a stack of PRs to tackle compilation time. I will update once the stack lands. Overall the compile time is roughly 280 seconds, and I am able to take off roughly 30 seconds till now . Regional compilation will definitely benefit this model. @StrongerXi has the latest numbers once. It seems that workflow needs some approval? |
|
@bot /style |
|
Style fixes have been applied. View the workflow run here. |
|
Oh yeah regional compilation would speed things up massively, when I tested a while back it went from 300s to 30s. Might be worth offering a similar api in diffusers and transformers? |
|
@DN6 a gentle ping in case this missed through the cracks |
|
This helps with torch.compile compilation latency. Avoiding unnecessary computation should also lead to a slightly improved eager latency.
What does this PR do?
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.