PR #35132: [XLA:GPU] Update HLO cublas workspace size after autotuner select the algorithm#35342
Merged
Merged
Conversation
93b0f83 to
96d9ada
Compare
… select the algorithm Imported from GitHub PR #35132 📝 Summary of Changes This PR introduces a pass that updates the workspace size for cuBLAS/cuBLASLt GEMM operations after autotuning has selected a specific algorithm. The GemmRewriter pass conservatively allocates workspace before autotuning. After autotuning,we know the exact algorithm selected and can query its actual workspace requirement, potentially reducing memory usage. 🎯 Justification Potentially reducing memory usage. 🚀 Kind of Contribution Please remove what does not apply: ⚡️ Performance Improvement, 🧪 Unit Tests: Existing gemm tests should cover the workspace size config. Copybara import of the project: -- a6ed265 by Shawn Wang <[email protected]>: Update cublas workspace size with the exact size extracted from algorithm -- d67a48a by Shawn Wang <[email protected]>: fix comments -- 613e090 by Shawn Wang <[email protected]>: add unittest Merging this change closes #35132 COPYBARA_INTEGRATE_REVIEW=#35132 from shawnwang18:shawnw/cublas_workspace 613e090 PiperOrigin-RevId: 845601031
96d9ada to
907b576
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #35132: [XLA:GPU] Update HLO cublas workspace size after autotuner select the algorithm
Imported from GitHub PR #35132
📝 Summary of Changes
This PR introduces a pass that updates the workspace size for cuBLAS/cuBLASLt GEMM operations after autotuning has selected a specific algorithm. The GemmRewriter pass conservatively allocates workspace before autotuning. After autotuning,we know the exact algorithm selected and can query its actual workspace requirement, potentially reducing memory usage.
🎯 Justification
Potentially reducing memory usage.
🚀 Kind of Contribution
Please remove what does not apply: ⚡️ Performance Improvement,
🧪 Unit Tests:
Existing gemm tests should cover the workspace size config.
Copybara import of the project:
--
a6ed265 by Shawn Wang [email protected]:
Update cublas workspace size with the exact size extracted from algorithm
--
d67a48a by Shawn Wang [email protected]:
fix comments
--
613e090 by Shawn Wang [email protected]:
add unittest
Merging this change closes #35132
FUTURE_COPYBARA_INTEGRATE_REVIEW=#35132 from shawnwang18:shawnw/cublas_workspace 613e090