Skip to content

Conversation

@csrwng
Copy link
Contributor

@csrwng csrwng commented Nov 21, 2025

What this PR does / why we need it:

Add comprehensive documentation for resource-based control plane autoscaling feature, including prerequisites, configuration, and troubleshooting steps.

Assisted-by: Composer (via Cursor)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Adds a new how-to guide describing Resource-Based Control Plane Autoscaling for AWS HostedClusters (dedicated-request-serving-components), plus a mkdocs navigation entry to expose the guide in the How-to section.

Changes

Cohort / File(s) Summary
Documentation Addition
docs/content/how-to/resource-based-control-plane-autoscaling.md
New how-to document explaining Resource-Based Control Plane Autoscaling: platform support (AWS, dedicated-request-serving-components topology), prerequisites (VPA operator in OwnNamespace), VPA installation (namespace, OperatorGroup, Subscription), VPA verification, VPA controller configuration, required ClusterSizingConfiguration, enabling via HostedCluster annotations, annotation inputs/outputs, capacity and memory-fraction options, operational workflow, examples, and troubleshooting.
Navigation Update
docs/mkdocs.yml
Added a navigation entry under "How-to guides" pointing to how-to/resource-based-control-plane-autoscaling.md.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus areas for review:
    • Accuracy of installation steps (namespaces, OperatorGroup, Subscription) and commands.
    • Correctness of HostedCluster annotation keys/values and ClusterSizingConfiguration examples.
    • Clarity and completeness of troubleshooting and verification instructions.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Indicates the PR includes changes for documentation labels Nov 21, 2025
@csrwng
Copy link
Contributor Author

csrwng commented Nov 21, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@csrwng csrwng changed the title docs: add resource-based control plane autoscaling guide CNTRLPLANE-1015: docs: add resource-based control plane autoscaling guide Nov 21, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 21, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 21, 2025

@csrwng: This pull request references CNTRLPLANE-1015 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:

Add comprehensive documentation for resource-based control plane autoscaling feature, including prerequisites, configuration, and troubleshooting steps.

Assisted-by: Composer (via Cursor)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/content/how-to/resource-based-control-plane-autoscaling.md (1)

145-145: Simplify "in conjunction with" to "with".

The phrase "in conjunction with" is unnecessarily wordy. Simplify to improve readability.

-- **`hypershift.openshift.io/resource-based-cp-auto-scaling`**: Set to `"true"` to enable resource-based autoscaling for the HostedCluster. This annotation must be used in conjunction with the `dedicated-request-serving-components` topology.
+- **`hypershift.openshift.io/resource-based-cp-auto-scaling`**: Set to `"true"` to enable resource-based autoscaling for the HostedCluster. This annotation must be used with the `dedicated-request-serving-components` topology.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between dd940af and 9458d7e.

📒 Files selected for processing (2)
  • docs/content/how-to/resource-based-control-plane-autoscaling.md (1 hunks)
  • docs/mkdocs.yml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • docs/content/how-to/resource-based-control-plane-autoscaling.md
  • docs/mkdocs.yml
🪛 LanguageTool
docs/content/how-to/resource-based-control-plane-autoscaling.md

[style] ~145-~145: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...edCluster. This annotation must be used in conjunction with the `dedicated-request-serving-componen...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

🪛 markdownlint-cli2 (0.18.1)
docs/content/how-to/resource-based-control-plane-autoscaling.md

203-203: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
🔇 Additional comments (2)
docs/content/how-to/resource-based-control-plane-autoscaling.md (1)

1-238: Comprehensive and well-structured documentation.

The guide provides clear, actionable guidance on resource-based control plane autoscaling. Prerequisites are detailed with step-by-step instructions, configuration examples are properly formatted, and the troubleshooting section follows a logical verification flow. The content appropriately highlights AWS-only availability and the required topology annotation.

docs/mkdocs.yml (1)

69-69: Navigation entry added correctly.

The new documentation entry is properly positioned in the "How-to guides" section with consistent formatting and indentation matching the surrounding entries.

Add comprehensive documentation for resource-based control plane
autoscaling feature, including prerequisites, configuration, and
troubleshooting steps.

Signed-off-by: Cesar Wong <[email protected]>
Assisted-by: Composer (via Cursor)
@csrwng csrwng force-pushed the cp_autoscaling_doc branch from 9458d7e to 5791d01 Compare November 21, 2025 19:21
@csrwng csrwng marked this pull request as ready for review November 21, 2025 19:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@openshift-ci openshift-ci bot requested review from devguyio and jparrill November 21, 2025 19:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/content/how-to/resource-based-control-plane-autoscaling.md (1)

145-145: Consider simplifying wordy phrasing.

Replace "in conjunction with" with a more concise alternative like "with" or "alongside" for better readability.

- **`hypershift.openshift.io/resource-based-cp-auto-scaling`**: Set to `"true"` to enable resource-based autoscaling for the HostedCluster. This annotation must be used in conjunction with the `dedicated-request-serving-components` topology.
+ **`hypershift.openshift.io/resource-based-cp-auto-scaling`**: Set to `"true"` to enable resource-based autoscaling for the HostedCluster. This annotation must be used alongside the `dedicated-request-serving-components` topology.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9458d7e and 5791d01.

📒 Files selected for processing (2)
  • docs/content/how-to/resource-based-control-plane-autoscaling.md (1 hunks)
  • docs/mkdocs.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/mkdocs.yml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • docs/content/how-to/resource-based-control-plane-autoscaling.md
🪛 LanguageTool
docs/content/how-to/resource-based-control-plane-autoscaling.md

[style] ~145-~145: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...edCluster. This annotation must be used in conjunction with the `dedicated-request-serving-componen...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

🔇 Additional comments (1)
docs/content/how-to/resource-based-control-plane-autoscaling.md (1)

1-238: Well-structured and comprehensive documentation.

The guide effectively covers the complete user journey from prerequisites through troubleshooting. The step-by-step VPA installation instructions, configuration examples, and practical monitoring/troubleshooting sections are clearly presented with actionable commands. The markdown formatting is correct (including the previous MD040 fix at line 203), and examples are reproducible. The logical flow from setup to operation makes this accessible for users implementing the feature.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

@csrwng: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bryan-cox
Copy link
Member

@csrwng this looks fine to me but I ran it through an AI code reviewer and it had some interesting suggestions so sharing those here. I'll tag this and put a hold on it in case you do want to make some changes based off this AI code review:

  ---
  📋 ISSUES FOUND

  HIGH - Documentation Clarity (1 issue)

  Issue #1: Ambiguous Size Capacity Configuration Behavior

  Location: docs/content/how-to/resource-based-control-plane-autoscaling.md:114-115

  Problem:
  The documentation states that capacity can be introspected from MachineSets but doesn't clearly explain:
  1. When introspection happens vs. when manual capacity specification is required
  2. What happens if neither capacity is specified NOR matching MachineSets exist
  3. The precedence order if both manual capacity and MachineSets exist

  Current Text:
  - Optional: Size configurations that include `capacity` specifications with memory values. If `capacity` is not specified for a size, the system will introspect the memory and CPU capacity from existing MachineSets in the `openshift-machine-api`
  namespace.

  Impact: Users may be confused about when they MUST specify capacity vs. when it's truly optional.

  Recommendation:
  Add a clarifying note or table explaining the behavior matrix:

  ### Size Capacity Configuration Behavior

  | Scenario | Behavior |
  |----------|----------|
  | `capacity` specified in config | Uses manual capacity values (recommended) |
  | `capacity` not specified, matching MachineSet exists | Introspects from MachineSet annotations |
  | `capacity` not specified, no matching MachineSet | ⚠️ Size cannot be recommended for this class |

  **Best Practice:** Explicitly specify `capacity` for all sizes to avoid dependency on MachineSet existence.

  Code Reference: hypershift-operator/controllers/resourcebasedcpautoscaler/machine_sizes_cache.go

  ---
  MEDIUM - Documentation Completeness (3 issues)

  Issue #2: Missing Platform Limitation Details

  Location: docs/content/how-to/resource-based-control-plane-autoscaling.md:7

  Problem:
  States "only available for HostedClusters using the request serving isolation architecture on AWS" but doesn't explain:
  - Why this limitation exists (technical reasons)
  - If other platforms are planned for future support
  - What happens if users try to enable on other platforms

  Recommendation:
  **Important**: This feature is only available for HostedClusters using the request serving isolation architecture on AWS.

  **Why AWS-only?** The feature requires the `dedicated-request-serving-components` topology which currently only supports AWS infrastructure with specific machine sizing capabilities.

  **Attempting to enable on other platforms:** The annotation will be ignored, and no VPA resources will be created.

  ---
  ---
  INFO - Enhancement Opportunities (2 suggestions)

  Enhancement #1: Add Architecture Diagram

  Location: After "Platform Support" section

  Suggestion:
  Add a diagram showing:
  - VPA monitoring kube-apiserver
  - Autoscaler controller reading VPA recommendations
  - Size recommendation flow
  - ClusterSizingConfiguration relationship

  Similar to diagrams in docs/content/how-to/distribute-hosted-cluster-workloads.md:74,81,94

  ---
  Enhancement #2: Add Example Monitoring Query

  Location: Monitoring section (line 219)

  Suggestion:
  Provide example PromQL/metrics queries for observability:

  ### Monitoring Queries

  **VPA Memory Recommendation:**
  ```promql
  vpa_status_recommendation_containerrecommendations_target{container="kube-apiserver"}

  Current vs Recommended Size:
  Check annotations on HostedCluster resource

  ---

@bryan-cox
Copy link
Member

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 26, 2025
@bryan-cox
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2025
@csrwng
Copy link
Contributor Author

csrwng commented Nov 27, 2025

/verified by @csrwng

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 27, 2025
@openshift-ci-robot
Copy link

@csrwng: This PR has been marked as verified by @csrwng.

In response to this:

/verified by @csrwng

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng
Copy link
Contributor Author

csrwng commented Nov 27, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 50304c0 into openshift:main Nov 28, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Indicates the PR includes changes for documentation jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants