-
Notifications
You must be signed in to change notification settings - Fork 115
Add Gang Scheduling Support for lws and volcano implementation example #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gang Scheduling Support for lws and volcano implementation example #498
Conversation
|
Hi @JesseStutler. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-lws canceled.
|
|
|
||
| func NewVolcanoProvider(client client.Client) *VolcanoProvider { | ||
| return &VolcanoProvider{ | ||
| client: client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the defaultBaseResourceProvider right? Which would be the same for every provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I add it now.
de5fb19 to
7a8c16a
Compare
|
/ok-to-test |
7a8c16a to
9c3d676
Compare
9c3d676 to
baf9e25
Compare
|
/cc @Edwinhr716 @kerthcet |
|
I added a volcano implementation for you to refer to. I can also add coscheduling plugin implementation if we also need it in v0.7.0 😄 |
baf9e25 to
251bf4b
Compare
251bf4b to
6103dc2
Compare
|
@JesseStutler can we finish this week? I don't want to last until next week. Thanks. |
Sorry for the late reply, I have something to do on the weekend. Can I finish it on Monday and ask you to review again if it's okay? @kerthcet |
@kerthcet You can review first about the e2e testing, I'll verify it on Monday. Is that OK? |
kerthcet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you fix the env error in the e2e test, I'll create a new ci workflow in test-infra, so we can verify the gang scheduling tests.
8ad887b to
4e6b368
Compare
Signed-off-by: JesseStutler <[email protected]>
4e6b368 to
940e139
Compare
|
/retest |
@kerthcet Now I have fixed all the CIs, please help me create a workflow, thanks. Does this require codes to merge? |
test/testutils/util.go
Outdated
| func ExpectValidPodGroups(ctx context.Context, k8sClient client.Client, provider schedulerprovider.ProviderType, lws *leaderworkerset.LeaderWorkerSet, expectedCount int) { | ||
| gvk := getPodGroupGVK(provider) | ||
| if gvk.Empty() { | ||
| ginkgo.Skip("Unsupported scheduler provider for PodGroup validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return error here? Empty gvk is not allowed I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have changed to ginkgo.Fail, please check
| // UpdatePodGroupAtIndex manually updates the PodGroup for a specific group index | ||
| func UpdatePodGroupAtIndex(ctx context.Context, k8sClient client.Client, providerType schedulerprovider.ProviderType, schedulerProvider schedulerprovider.SchedulerProvider, lws *leaderworkerset.LeaderWorkerSet, groupIndex string) { | ||
| gvk := getPodGroupGVK(providerType) | ||
| if gvk.Empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have changed to ginkgo.Fail, please check
| // ExpectValidPodGroupAtIndex checks that a PodGroup exists for the specified group index and has the correct owner reference | ||
| func ExpectValidPodGroupAtIndex(ctx context.Context, k8sClient client.Client, provider schedulerprovider.ProviderType, lws *leaderworkerset.LeaderWorkerSet, groupIndex string) { | ||
| gvk := getPodGroupGVK(provider) | ||
| if gvk.Empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have changed to ginkgo.Fail, please check
| return defaultNamespace | ||
| } | ||
|
|
||
| // CalculatePGMinResources calculates the minimum resources needed for an entire PodGroup [1 Leader + (size-1) Worker pods] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to have a unit test for this later, it could be a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll do it in later PR, thanks for your reminder
Signed-off-by: JesseStutler <[email protected]>
940e139 to
a2cbb68
Compare
|
/lgtm Thanks @JesseStutler |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JesseStutler, kerthcet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@JesseStutler I was expecting changes under |
|
Currently, I don't think gang scheduler is supported via kustomize, unless user manually applies required configurations. |
Thanks for your reminder @ardaguclu I was also thinking about this question. I'm not very familiar with kustomize. Are the configurations under |
|
For example, if the only requirement is to define a ClusterRole, you need to add it in here https://github.com/kubernetes-sigs/lws/tree/main/config/rbac. After that you need to add this file lws/config/rbac/kustomization.yaml Line 1 in b73c72c
|
Good idea, thanks! @ardaguclu I will add it in later PRs |
What type of PR is this?
/kind feature
What this PR does / why we need it
Related to #496, provide the current implementation code for Gang Scheduling and Volcano Implementation. For coscheduling scheduler plugin, I can separate it in the other PR.
Which issue(s) this PR fixes
Related #407
Special notes for your reviewer
Does this PR introduce a user-facing change?
Validation