Skip to content

Conversation

@seanlaii
Copy link
Contributor

@seanlaii seanlaii commented Oct 26, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
Add ResourceQuota plugin for multi-component scheduling.

Which issue(s) this PR fixes:

Part of #6734

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-scheduler-estimator`: Add `ResourceQuota` plugin for multi-component scheduling.

@karmada-bot karmada-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 26, 2025
@karmada-bot karmada-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 26, 2025
@seanlaii seanlaii force-pushed the component-resource-quota branch from 65a7e6c to 46a3f5b Compare October 26, 2025 00:28
@seanlaii seanlaii changed the title [Feature] Add resourcequota plugin for EstimateComponents [Feature] Add ResourceQuota plugin for EstimateComponents Oct 26, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 86.23188% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.33%. Comparing base (f559e3f) to head (c17f8f3).

Files with missing lines Patch % Lines
...r/framework/plugins/resourcequota/resourcequota.go 86.23% 11 Missing and 8 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6875      +/-   ##
==========================================
+ Coverage   46.26%   46.33%   +0.06%     
==========================================
  Files         697      697              
  Lines       47523    47629     +106     
==========================================
+ Hits        21988    22068      +80     
- Misses      23878    23894      +16     
- Partials     1657     1667      +10     
Flag Coverage Δ
unittests 46.33% <86.23%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@seanlaii seanlaii changed the title [Feature] Add ResourceQuota plugin for EstimateComponents [Feature] Add ResourceQuota plugin for multi-component scheduling Oct 26, 2025
@seanlaii seanlaii force-pushed the component-resource-quota branch from 46a3f5b to 39428b9 Compare October 26, 2025 23:14
@seanlaii seanlaii marked this pull request as ready for review October 26, 2025 23:18
@karmada-bot karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 26, 2025
@seanlaii
Copy link
Contributor Author

Hi @RainbowMango @zhzhuang-zju @mszacillo, please help take a look when you get a chance! Thank you!

@zhzhuang-zju
Copy link
Contributor

Thanks
/assign

@seanlaii
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ResourceQuota plugin for multi-component scheduling, a significant and valuable feature. The implementation is well-structured, with clear separation of concerns and extensive test coverage that handles many edge cases.

I've identified a couple of potential nil pointer dereferences that could lead to panics if a component's ReplicaRequirements are not provided. I've also suggested a refinement to the isSupportedResource function to make it more robust and prevent incorrect resource calculations.

Overall, this is a solid contribution. Addressing these points will further improve the robustness of the new plugin.

@seanlaii seanlaii force-pushed the component-resource-quota branch 3 times, most recently from f6f7e92 to 6ca8cd5 Compare October 27, 2025 05:34
@RainbowMango RainbowMango added this to the v1.16 milestone Oct 28, 2025
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review is ongoing.

return maxSets, framework.NewResult(framework.Noopperation, fmt.Sprintf("%s received empty components list", pl.Name()))
}

namespace := components[0].ReplicaRequirements.Namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the thing that I feel is not right. Similar to #6876 (comment).

The namespace might not have to be stored inside of ReplicaRequirements. Furthermore, can we assume all components share the same namespace? If yes, can we put the namespace directly under MaxAvailableComponentSetsRequest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed the CRDs for SparkApplication and FlinkDeployment, and their different components do not explicitly support configuring the namespace. They only theoretically have the capability to configure the namespace because they include corev1.PodTemplateSpec.

From the implementation of the Spark operator, it only focuses on the namespace of the SparkApplication. Additionally, I tried deploying a CR locally with namespaces set for both the driver and executor, and it ultimately failed to deploy. Therefore, SparkApplication does not support multi-components cross-namespace configurations.

For FlinkDeployment, I believe it is the same. @mszacillo @seanlaii, you are more knowledgeable about FlinkDeployment. Could you confirm this?

Copy link
Contributor Author

@seanlaii seanlaii Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed experiment and explanation!
I am not familiar with FlinkDeployment, but for RayJob, RayCluster, and RayService, they only support deploying to a single namespace.

@seanlaii seanlaii force-pushed the component-resource-quota branch from 6ca8cd5 to 00f9489 Compare October 29, 2025 01:25
@zhzhuang-zju
Copy link
Contributor

/retest

Comment on lines 169 to 187
// Evaluate each priority class group separately and take the minimum across all groups.
for priorityClassName, groupComponents := range componentsByPriority {
klog.V(5).Infof("%s: evaluating %d components with priorityClassName %q",
pl.Name(), len(groupComponents), priorityClassName)

groupMaxSets := pl.evaluateComponentGroup(rqList, groupComponents, priorityClassName)

klog.V(5).Infof("%s: priorityClassName %q allows %d component sets",
pl.Name(), priorityClassName, groupMaxSets)

if groupMaxSets < maxSets {
maxSets = groupMaxSets
}

// Early exit if any priority group allows zero sets.
if maxSets == 0 {
break
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic in the loop might have a critical flaw in how it evaluates quota consumption across priority classes.

By evaluating each groupComponents separately against the same, unmodified rqList (the same total available quota), this can lead to over-admission.

Consider a workload with two components (A and B) under different PriorityClasses, and the total quota allows only one such component. Evaluating A against the full quota might allow 1 set. Evaluating B against the same full quota might also allow 1 set. The loop takes the minimum (maxSets = 1) and admits the workload. However, running both A and B would require resources for two components, exceeding the quota.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, the current algorithm calculates the available replicas using availableResources / requestedResource, which is convenient. However, considering that in the future we will need to pass in a batch of components requiring reservation, we will first need to subtract these reserved resources from the availableResources before performing the evaluation. Therefore, we still need to develop an algorithm that decrements the Quota accordingly. See the draft idea at #6812 (comment).

In summary, although we don't need it right now, should we prepare for future extensibility at this time?

Copy link
Contributor

@zhzhuang-zju zhzhuang-zju Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By evaluating each groupComponents separately against the same, unmodified rqList (the same total available quota), this can lead to over-admission.

+1. The workflow described in #6875 (comment) is correct in principle. However, during implementation, the first step—Determine which components match RQ's scopes—was moved before the For each ResourceQuota RQ loop, so that components are grouped by priorityClassName upfront.

Take a ResourceQuota without any scope defined as an example: it should originally compute the number of available replica sets based on the entire componentSet. But due to this change, the quota now calculates the available sets separately for each groupedComponents and then takes the minimum across groups—effectively over-constraining the result.

we will first need to subtract these reserved resources from the availableResources before performing the evaluation.

FYI, the SubResource function is specifically used to subtract these reserved resources from the availableResource. We can use it if needed.

// SubResource is used to subtract two resources, if r < rr, set r to zero.
func (r *Resource) SubResource(rr *Resource) *Resource {
if r == nil || rr == nil {
return r
}
r.MilliCPU = MaxInt64(r.MilliCPU-rr.MilliCPU, 0)
r.Memory = MaxInt64(r.Memory-rr.Memory, 0)
r.EphemeralStorage = MaxInt64(r.EphemeralStorage-rr.EphemeralStorage, 0)
r.AllowedPodNumber = MaxInt64(r.AllowedPodNumber-rr.AllowedPodNumber, 0)
for rrName, rrScalar := range rr.ScalarResources {
if lifted.IsScalarResourceName(rrName) {
rScalar, ok := r.ScalarResources[rrName]
if ok {
r.ScalarResources[rrName] = MaxInt64(rScalar-rrScalar, 0)
}
}
}
return r
}

Copy link
Contributor Author

@seanlaii seanlaii Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are totally correct! Sorry, I implemented in a wrong way. I have updated the implementation so that it will now first iterate the resource quota and then do the filtering based on the priority class name. I have also added a test case to validate this case.

Regarding the reservation, maybe we can have something like this:

func evaluateResourcesAgainstQuota(
    availableResources corev1.ResourceList,
    perSetRequirements corev1.ResourceList,
    reservedResources corev1.ResourceList) int32 {
    
    effectiveAvailable := availableResources.DeepCopy()
    if reservedResources != nil {
        effectiveAvailable.Sub(reserved)
    }
    
    filtered := filterConstrainedResources(effectiveAvailable, perSetRequirements)
    resource := util.NewResource(effectiveAvailable)
    allowed := resource.MaxDivided(filtered) 
    
    return int32(allowed)
}

Please feel free to correct me if my understanding is incorrect.
Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it sounds even better than what I proposed before.
What I was thinking is mostly for the estimation based on node resources, which can not go with the divide way.

@seanlaii seanlaii force-pushed the component-resource-quota branch from 00f9489 to f300b54 Compare October 30, 2025 21:19
@karmada-bot karmada-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 30, 2025
@seanlaii seanlaii force-pushed the component-resource-quota branch 4 times, most recently from 2a42503 to b6ccfc3 Compare October 30, 2025 22:36
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Looks pretty good.

Just left some warnings raised by my IDE, some of them are for the legacy code.
It would be great to get them fixed in this PR, but it shouldn't be a blocker.

I will wait for a few hours, and then move it forward.

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2025
@seanlaii seanlaii force-pushed the component-resource-quota branch from b6ccfc3 to c17f8f3 Compare October 31, 2025 02:42
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2025
@seanlaii
Copy link
Contributor Author

Just left some warnings raised by my IDE, some of them are for the legacy code.
It would be great to get them fixed in this PR, but it shouldn't be a blocker.

Sure, rename them. Thanks!

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I ran a quick test on my side, and it works well.
The test is similar to #6857 (comment), but I added a ResourceQuota to member1 cluster:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: pods-high
spec:
  hard:
    cpu: "0"
    memory: "0Gi"
    pods: "0"

The ResourceQuota prevents anything from being provisioned on member1.

It works as expected:

  1. the final estimation result is 0 from karmada-scheduler-estimator-member1
{"ts":1761880763617.0396,"caller":"resourcequota/resourcequota.go:198","msg":"ResourceQuotaEstimator: final estimation result: 0 component sets, status: Unschedulable","v":5}
  1. the job is scheduled to member2 as member1 can not accommodate it.
{"ts":1761880763651.3323,"caller":"core/generic_scheduler.go:102","msg":"Selected clusters: [{member2 100 9 member2 9}]","v":4}
{"ts":1761880763651.5518,"caller":"core/generic_scheduler.go:108","msg":"Assigned Replicas: [{member2 0}]","v":4}
{"ts":1761880763651.6294,"caller":"scheduler/scheduler.go:590","msg":"ResourceBinding(default/dk-job-job) scheduled to clusters [{member2 0}]","v":4}

PS: The log format should be adjusted and is tracked by #6880

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2025
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2025
@karmada-bot karmada-bot merged commit 15641d7 into karmada-io:master Oct 31, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants