skip calculate available replicas when no need #4574

zhzhuang-zju · 2024-01-23T11:52:17Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
When Karmada is scheduling, it will always invoke calAvailableReplicas to calculate available replicas, regardless of necessity, thereby increasing CPU overhead.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
NONE
Does this PR introduce a user-facing change?:

`karmada-scheduler`: Optimized the scheduling step by eliminating unnecessary `availableReplicas` calculations, improving performance.

codecov-commenter · 2024-01-23T12:03:58Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.60%. Comparing base (29f3860) to head (ee0e484).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4574      +/-   ##
==========================================
- Coverage   45.60%   45.60%   -0.01%     
==========================================
  Files         692      692              
  Lines       57678    57685       +7     
==========================================
+ Hits        26305    26307       +2     
- Misses      29728    29735       +7     
+ Partials     1645     1643       -2

Flag	Coverage Δ
unittests	`45.60% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Garrybest · 2024-01-23T13:35:08Z

When Karmada is scheduling, it will always invoke calAvailableReplicas to calculate available replicas, regardless of necessity, thereby increasing CPU overhead.

Well, I'm afraid not. This func will only be invoked when strategy is Aggregated or DynamicWeight which is necessay indeed.

zhzhuang-zju · 2024-01-24T01:31:35Z

Well, I'm afraid not. This func will only be invoked when strategy is Aggregated or DynamicWeight which is necessay indeed.

refer to

karmada/pkg/scheduler/core/common.go

Lines 32 to 38 in d508417

    
           func SelectClusters(clustersScore framework.ClusterScoreList, 
        
           	placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec) ([]*clusterv1alpha1.Cluster, error) { 
        
           	startTime := time.Now() 
        
           	defer metrics.ScheduleStep(metrics.ScheduleStepSelect, startTime) 
        
           	groupClustersInfo := spreadconstraint.GroupClustersWithScore(clustersScore, placement, spec, calAvailableReplicas) 
        
           	return spreadconstraint.SelectBestClusters(placement, groupClustersInfo, spec.Replicas)

as you can see, the function GroupClustersWithScore is executed first, and then the function SelectBestClusters. And function GroupClustersWithScore will execute function calAvailableReplicas to calculate available replicas, no matter what the strategy is.

jwcesign

Other lgtm

pkg/scheduler/core/spreadconstraint/group_clusters.go

pkg/scheduler/core/spreadconstraint/select_clusters.go

RainbowMango · 2024-02-18T08:14:01Z

/retest

pkg/scheduler/core/spreadconstraint/group_clusters_test.go

zhzhuang-zju · 2024-04-28T07:26:40Z

cc @Garrybest

RainbowMango · 2025-09-29T11:20:39Z

@zhzhuang-zju Please help to resolve the conflicts, i will take a look then.

zhzhuang-zju · 2025-09-29T11:23:32Z

@zhzhuang-zju Please help to resolve the conflicts, i will take a look then.

Received, I will pick it up soon.

RainbowMango · 2025-09-29T11:34:47Z

OK. No rush.

zhzhuang-zju · 2025-10-09T08:01:12Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an optimization to skip calculating available replicas during scheduling when it's not necessary. However, the implementation has a critical issue: it incorrectly skips the calculation for the Duplicated scheduling strategy, which would lead to a panic due to a division-by-zero error in the scoring function. There is also a potential nil pointer dereference. I have provided a detailed comment with a suggested fix for this function. Additionally, I've pointed out a minor discrepancy in a test case name for better clarity.

pkg/scheduler/core/spreadconstraint/select_clusters.go

RainbowMango

/assign

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

zhzhuang-zju · 2025-10-16T07:52:30Z

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

sure~

RainbowMango · 2025-10-16T08:26:06Z

/genmini review

RainbowMango · 2025-10-16T08:28:12Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an optimization to skip calculating available replicas when it's not necessary, such as for 'Duplicated' or static 'Weighted' scheduling strategies. The changes are logical and well-tested. My review includes a suggestion to improve the readability of the new logic and points out an inconsistency with existing code that should be addressed for better maintainability.

pkg/scheduler/core/spreadconstraint/select_clusters.go

zhzhuang-zju · 2025-10-20T04:04:59Z

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

This instability in e2e tests is caused by feature changes. Some Prometheus metrics only appear at the metrics endpoint after actual usage. For example, karmada_scheduler_estimator_estimating_request_total requires the scheduler to invoke the estimator's maxAvailable request. Currently, when the ReplicaSchedulingType is set to Duplicated, the estimator is not called. Therefore, if metrics testing is performed too early, it will fail.

karmada-bot · 2025-10-20T07:57:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS
test/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot · 2025-10-20T07:57:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS
test/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zhzhuang-zju · 2025-10-20T08:02:12Z

ERROR: failed to build: failed to solve: alpine:3.22.2: failed to resolve source metadata for docker.io/library/alpine:3.22.2: unexpected status from HEAD request to https://registry-1.docker.io/v2/library/alpine/manifests/3.22.2: 500 Internal Server Error

Downloading common from repo https://charts.bitnami.com/bitnami
Save error occurred: could not download oci://registry-1.docker.io/bitnamicharts/common: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/bitnamicharts/common/manifests/sha256:98cc992cb269f0b3f8f87c4bf933c1c0991b2d639f79e0bf62d1f35926efeb3f: 500 Internal Server Error - Server message: unknown: unknown error

record

zhzhuang-zju · 2025-10-20T09:46:58Z

/retest

zhzhuang-zju · 2025-10-22T08:51:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a valuable optimization to skip the calculation of available replicas for Duplicated and static weighted scheduling strategies, which should help reduce CPU overhead during scheduling. The implementation looks solid, with the core logic encapsulated in new helper functions and backed by unit tests. I've identified a potential bug in an updated E2E test where a slice is not being reset, and I've also suggested adding a comment to a complex function to improve maintainability. Overall, this is a good improvement.

test/e2e/suites/base/metrics_test.go

pkg/scheduler/core/spreadconstraint/select_clusters.go

Signed-off-by: zhzhuang-zju <[email protected]>

karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jan 23, 2024

karmada-bot requested review from Garrybest and whitewindmills January 23, 2024 11:52

karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 23, 2024

zhzhuang-zju force-pushed the schedule branch from 631ca2f to 6b84629 Compare January 26, 2024 02:01

karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 26, 2024

jwcesign reviewed Jan 26, 2024

View reviewed changes

pkg/scheduler/core/spreadconstraint/group_clusters.go Outdated Show resolved Hide resolved

pkg/scheduler/core/spreadconstraint/select_clusters.go Outdated Show resolved Hide resolved

pkg/scheduler/core/spreadconstraint/select_clusters.go Outdated Show resolved Hide resolved

zhzhuang-zju force-pushed the schedule branch from 6b84629 to e1a71e5 Compare January 27, 2024 03:37

zhzhuang-zju force-pushed the schedule branch from e1a71e5 to 8ef8ca3 Compare April 28, 2024 02:42

karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 28, 2024

zhzhuang-zju commented Apr 28, 2024

View reviewed changes

pkg/scheduler/core/spreadconstraint/group_clusters_test.go Outdated Show resolved Hide resolved

zhzhuang-zju mentioned this pull request Sep 25, 2025

Unnecessary to call estimator in SelectClusters #6782

Closed

zhzhuang-zju force-pushed the schedule branch from 8ef8ca3 to 6497905 Compare October 9, 2025 07:55

karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2025

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

pkg/scheduler/core/spreadconstraint/select_clusters.go Show resolved Hide resolved

zhzhuang-zju force-pushed the schedule branch from 6497905 to 40ce389 Compare October 9, 2025 09:36

karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2025

zhzhuang-zju force-pushed the schedule branch from 40ce389 to 1c223cf Compare October 10, 2025 03:08

RainbowMango reviewed Oct 16, 2025

View reviewed changes

karmada-bot assigned RainbowMango Oct 16, 2025

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

pkg/scheduler/core/spreadconstraint/select_clusters.go Outdated Show resolved Hide resolved

zhzhuang-zju force-pushed the schedule branch from 1c223cf to 85e5f42 Compare October 20, 2025 03:52

zhzhuang-zju force-pushed the schedule branch 4 times, most recently from 41dfd0c to 4eda7fb Compare October 20, 2025 07:46

zhzhuang-zju closed this Oct 20, 2025

zhzhuang-zju force-pushed the schedule branch from 4eda7fb to b0cb266 Compare October 20, 2025 07:55

karmada-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025

zhzhuang-zju reopened this Oct 20, 2025

karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 20, 2025

zhzhuang-zju mentioned this pull request Oct 22, 2025

Flaky E2E tests – tracking intermittent failures #6841

Open

6 tasks

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

test/e2e/suites/base/metrics_test.go Show resolved Hide resolved

pkg/scheduler/core/spreadconstraint/select_clusters.go Show resolved Hide resolved

skip calculate available replicas when no need

ee0e484

Signed-off-by: zhzhuang-zju <[email protected]>

zhzhuang-zju force-pushed the schedule branch from aded184 to ee0e484 Compare October 23, 2025 03:44

skip calculate available replicas when no need #4574

Are you sure you want to change the base?

skip calculate available replicas when no need #4574

Uh oh!

Conversation

zhzhuang-zju commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Garrybest commented Jan 23, 2024

Uh oh!

zhzhuang-zju commented Jan 24, 2024

Uh oh!

jwcesign left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RainbowMango commented Feb 18, 2024

Uh oh!

Uh oh!

zhzhuang-zju commented Apr 28, 2024

Uh oh!

RainbowMango commented Sep 29, 2025

Uh oh!

zhzhuang-zju commented Sep 29, 2025

Uh oh!

RainbowMango commented Sep 29, 2025

Uh oh!

zhzhuang-zju commented Oct 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

zhzhuang-zju commented Oct 16, 2025

Uh oh!

RainbowMango commented Oct 16, 2025

Uh oh!

RainbowMango commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

zhzhuang-zju commented Oct 20, 2025

Uh oh!

karmada-bot commented Oct 20, 2025

Uh oh!

karmada-bot commented Oct 20, 2025

Uh oh!

zhzhuang-zju commented Oct 20, 2025

Uh oh!

zhzhuang-zju commented Oct 20, 2025

Uh oh!

zhzhuang-zju commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zhzhuang-zju commented Jan 23, 2024 •

edited

Loading

codecov-commenter commented Jan 23, 2024 •

edited

Loading