-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[CI] Add Async Eplb nightly CI tests #29385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add Async Eplb nightly CI tests #29385
Conversation
Signed-off-by: David Chen <[email protected]>
Signed-off-by: David Chen <[email protected]>
Signed-off-by: David Chen <[email protected]>
Signed-off-by: David Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces nightly CI tests for asynchronous EPLB for the DeepSeek-V2-Lite and Qwen3-Next models. The changes are well-structured, adding new test scripts and updating the Buildkite pipeline. However, I've identified a critical issue in the new test scripts: the use_async flag in the EPLB configuration is incorrectly passed as a string ("true") instead of a boolean (true). This will prevent the async feature from being enabled, meaning the tests would not cover the intended functionality. I've provided suggestions to correct this in the review comments.
| --data-parallel-size 2 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use_async parameter in the JSON for --eplb-config is specified as the string "true". For it to be correctly parsed as a boolean value by Pydantic, it should be the JSON boolean literal true (without quotes). With the current value, the async feature will not be enabled, which defeats the purpose of this test.
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ | |
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":true}' \ |
| --tensor-parallel-size 4 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use_async parameter in the JSON for --eplb-config is specified as the string "true". For it to be correctly parsed as a boolean value by Pydantic, it should be the JSON boolean literal true (without quotes). With the current value, the async feature will not be enabled, which defeats the purpose of this test.
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ | |
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":true}' \ |
|
@tlrmchlsmth @DarkLight1337 PTAL, thx |
Signed-off-by: David Chen <[email protected]>
Signed-off-by: David Chen <[email protected]>
|
Ci failed test_eagle_correctness is not related about this pr. |
|
Merging, sorry for the delay! |
Waiting to be merged |
|
I don't have permission to force-merge this PR, need to wait for @simon-mo @WoosukKwon @youkaichao |
Signed-off-by: WeiQing Chen <[email protected]>
Signed-off-by: WeiQing Chen <[email protected]>
|
@DarkLight1337 CI passed, PTAL, thx. |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the test!
| --data-parallel-size 2 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, not sure the default step interval actually triggering eplb here
This reverts commit 7fe9c1a.
Signed-off-by: David Chen <[email protected]> Signed-off-by: WeiQing Chen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Purpose
#22179 comment Add Async Eplb nightly CI
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.