-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[misc] CUDA Time Layerwise Profiler #8337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc] CUDA Time Layerwise Profiler #8337
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
3516c46 to
d857de9
Compare
1a0844e to
52aafcf
Compare
52aafcf to
e03bedb
Compare
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Since the script is pretty technically involved and relies on exact attributes to exist, could you add a simple e2e test to run in CI so we can know if torch updates break it?
what's the easiest way to do this? just add a pytest test or just invoke offline_profile somehow? is there instructions on how to register something with buildkite or all pytest folders already automatically run? @mgoin added examples test |
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and works well!
e88f143 to
97647e1
Compare
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came in here because I wanted to use offline_profile.py
I added some minor comments, but LGTM
Co-authored-by: Michael Goin <[email protected]>
84c4e81 to
c1a5507
Compare
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Alvant <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Amit Garg <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: qishuai <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
Layerwise profiler for see how much time is spent on CUDA (GPU kernels) for each module/layer
Example of how to run a profile
Then there are some utilities for looking at the profile breakdown, e.g. to get a summary table of the prefill phase you can run:
Or to view it as a graph you can run: