-
Notifications
You must be signed in to change notification settings - Fork 5.2k
refactor apply_w8a8_block_fp8_linear in fp #6545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zhyncs
merged 26 commits into
sgl-project:main
from
ChangyiYang:refactor_apply_w8a8_block_fp8_linear
May 29, 2025
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
47a5c62
refactor apply_w8a8_block_fp8_linear
ChangyiYang 87a5629
refactoring the dispatching logic to avoid filtering overhead
ChangyiYang cb4909c
Modify comments in fp8_kernel.py
ChangyiYang fecf409
create w8a8_block_fp8_matmul_triton, leave w8a8_block_fp8_matmul as a…
ChangyiYang cf851a1
fix typo
ChangyiYang 48e3781
Update kernel function ref in bench_fp8_blockwise_gemm.py
ChangyiYang e8db7ce
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
Alcanderian bea308a
fix referenced before assignment error
ChangyiYang 2c78443
fix bug that output_dtype is not passed correctly
ChangyiYang 807fc4e
fix bug that output_dtype is not passed correctly
ChangyiYang b0e2c42
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
zhyncs bdab9a1
[PD Perf] replace Queue to FastQueue (#6649)
whybeyoung 279885a
[Bugfix] Fix slice operation when chunk size mismatch (#6697)
ShangmingCai 8f84f9c
[Bugfix] Fix ChatCompletion endpoint of mini_lb when stream is set (#…
ShangmingCai 0de019b
[CI] Fix setup of disaggregation with different tp (#6706)
ShangmingCai cbf1e96
[PD] Remove Unnecessary Exception Handling for FastQueue.get() (#6712)
Hongbosherlock 1a55a95
Fuse routed_scaling_factor in DeepSeek (#6710)
fzyzcjy 6374a27
Overlap two kernels in DeepSeek with communication (#6711)
fzyzcjy cdedbb3
Minor refactor two-batch overlap (#6682)
fzyzcjy 93ef744
Speed up when having padding tokens two-batch overlap (#6668)
fzyzcjy 7df4699
[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (…
Fridge003 8368a5d
create new dispatching function flashinfer_gemm_w8a8_block_fp8_linear
ChangyiYang 2664692
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
ChangyiYang 4ef9146
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
ChangyiYang 37bf9c9
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
ChangyiYang d4e3cf6
Merge branch 'main' into refactor_apply_w8a8_block_fp8_linear
zhyncs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.