Skip to content

Conversation

@zhzhuang-zju
Copy link
Contributor

@zhzhuang-zju zhzhuang-zju commented Jan 23, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:
When Karmada is scheduling, it will always invoke calAvailableReplicas to calculate available replicas, regardless of necessity, thereby increasing CPU overhead.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
NONE
Does this PR introduce a user-facing change?:

`karmada-scheduler`: Optimized the scheduling step by eliminating unnecessary `availableReplicas` calculations, improving performance.

@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jan 23, 2024
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 23, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.60%. Comparing base (29f3860) to head (ee0e484).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4574      +/-   ##
==========================================
- Coverage   45.60%   45.60%   -0.01%     
==========================================
  Files         692      692              
  Lines       57678    57685       +7     
==========================================
+ Hits        26305    26307       +2     
- Misses      29728    29735       +7     
+ Partials     1645     1643       -2     
Flag Coverage Δ
unittests 45.60% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Garrybest
Copy link
Member

When Karmada is scheduling, it will always invoke calAvailableReplicas to calculate available replicas, regardless of necessity, thereby increasing CPU overhead.

Well, I'm afraid not. This func will only be invoked when strategy is Aggregated or DynamicWeight which is necessay indeed.

@zhzhuang-zju
Copy link
Contributor Author

Well, I'm afraid not. This func will only be invoked when strategy is Aggregated or DynamicWeight which is necessay indeed.

refer to

func SelectClusters(clustersScore framework.ClusterScoreList,
placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec) ([]*clusterv1alpha1.Cluster, error) {
startTime := time.Now()
defer metrics.ScheduleStep(metrics.ScheduleStepSelect, startTime)
groupClustersInfo := spreadconstraint.GroupClustersWithScore(clustersScore, placement, spec, calAvailableReplicas)
return spreadconstraint.SelectBestClusters(placement, groupClustersInfo, spec.Replicas)

as you can see, the function GroupClustersWithScore is executed first, and then the function SelectBestClusters. And function GroupClustersWithScore will execute function calAvailableReplicas to calculate available replicas, no matter what the strategy is.

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 26, 2024
Copy link
Member

@jwcesign jwcesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other lgtm

@RainbowMango
Copy link
Member

/retest

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 28, 2024
@zhzhuang-zju
Copy link
Contributor Author

cc @Garrybest

@RainbowMango
Copy link
Member

@zhzhuang-zju Please help to resolve the conflicts, i will take a look then.

@zhzhuang-zju
Copy link
Contributor Author

@zhzhuang-zju Please help to resolve the conflicts, i will take a look then.

Received, I will pick it up soon.

@RainbowMango
Copy link
Member

OK. No rush.

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2025
@zhzhuang-zju
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization to skip calculating available replicas during scheduling when it's not necessary. However, the implementation has a critical issue: it incorrectly skips the calculation for the Duplicated scheduling strategy, which would lead to a panic due to a division-by-zero error in the scoring function. There is also a potential nil pointer dereference. I have provided a detailed comment with a suggested fix for this function. Additionally, I've pointed out a minor discrepancy in a test case name for better clarity.

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2025
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

@zhzhuang-zju
Copy link
Contributor Author

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

sure~

@RainbowMango
Copy link
Member

/genmini review

@RainbowMango
Copy link
Member

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization to skip calculating available replicas when it's not necessary, such as for 'Duplicated' or static 'Weighted' scheduling strategies. The changes are logical and well-tested. My review includes a suggestion to improve the readability of the new logic and points out an inconsistency with existing code that should be addressed for better maintainability.

@zhzhuang-zju
Copy link
Contributor Author

@zhzhuang-zju The failing e2e test seems unrelated, but could you please give it a look?

This instability in e2e tests is caused by feature changes. Some Prometheus metrics only appear at the metrics endpoint after actual usage. For example, karmada_scheduler_estimator_estimating_request_total requires the scheduler to invoke the estimator's maxAvailable request. Currently, when the ReplicaSchedulingType is set to Duplicated, the estimator is not called. Therefore, if metrics testing is performed too early, it will fail.

@zhzhuang-zju zhzhuang-zju force-pushed the schedule branch 4 times, most recently from 41dfd0c to 4eda7fb Compare October 20, 2025 07:46
@karmada-bot karmada-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025
@zhzhuang-zju zhzhuang-zju reopened this Oct 20, 2025
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 20, 2025
@zhzhuang-zju
Copy link
Contributor Author

ERROR: failed to build: failed to solve: alpine:3.22.2: failed to resolve source metadata for docker.io/library/alpine:3.22.2: unexpected status from HEAD request to https://registry-1.docker.io/v2/library/alpine/manifests/3.22.2: 500 Internal Server Error

Downloading common from repo https://charts.bitnami.com/bitnami
Save error occurred: could not download oci://registry-1.docker.io/bitnamicharts/common: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/bitnamicharts/common/manifests/sha256:98cc992cb269f0b3f8f87c4bf933c1c0991b2d639f79e0bf62d1f35926efeb3f: 500 Internal Server Error - Server message: unknown: unknown error

record

@zhzhuang-zju
Copy link
Contributor Author

/retest

@zhzhuang-zju
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable optimization to skip the calculation of available replicas for Duplicated and static weighted scheduling strategies, which should help reduce CPU overhead during scheduling. The implementation looks solid, with the core logic encapsulated in new helper functions and backed by unit tests. I've identified a potential bug in an updated E2E test where a slice is not being reset, and I've also suggested adding a comment to a complex function to improve maintainability. Overall, this is a good improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants