-
Notifications
You must be signed in to change notification settings - Fork 428
OCPBUGS-63353: Fix ValidAWSIdentityProvider status when KAS is unavailable #7151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
@muraee: This pull request references Jira Issue OCPBUGS-63353, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdded KubeAPIServer availability and EC2 client presence pre-checks to the AWS identity-provider health check; when either is unavailable the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (1)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
@coderabbitai review |
|
/auto-cc |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
control-plane-operator/controllers/healthcheck/aws_test.go (2)
56-64: Consider verifying the expected message in the "KAS available" test case.When KAS is available but the EC2 client is nil (which will be the case in this test environment without mocking), the function sets a specific message "AWS EC2 client is not available". Adding this to
expectedMessagePartwould make the test more complete and explicit about the expected behavior.Apply this diff:
{ name: "KAS available", kasCondition: &metav1.Condition{ Type: string(hyperv1.KubeAPIServerAvailable), Status: metav1.ConditionTrue, }, - expectedStatus: metav1.ConditionUnknown, - expectedReason: hyperv1.StatusUnknownReason, - shouldReturnEarly: false, // Will proceed to check EC2 client + expectedStatus: metav1.ConditionUnknown, + expectedReason: hyperv1.StatusUnknownReason, + expectedMessagePart: "AWS EC2 client is not available", + shouldReturnEarly: false, // Will proceed to check EC2 client },
109-110: Variable naming suggests substring match but performs exact comparison.The variable is named
expectedMessagePartwhich suggests a substring check usingstrings.Contains, but the comparison on line 109 performs an exact equality check. Consider renaming toexpectedMessagefor clarity, or usestrings.Containsif partial matching is intended.For exact matching (current behavior), apply this diff:
testCases := []struct { name string kasCondition *metav1.Condition expectedStatus metav1.ConditionStatus expectedReason string - expectedMessagePart string + expectedMessage string shouldReturnEarly bool }{ { name: "KAS not available - condition missing", kasCondition: nil, expectedStatus: metav1.ConditionUnknown, expectedReason: hyperv1.StatusUnknownReason, - expectedMessagePart: "Cannot validate AWS identity provider while KubeAPIServer is not available", + expectedMessage: "Cannot validate AWS identity provider while KubeAPIServer is not available", shouldReturnEarly: true, }, // ... update remaining test cases similarly } // ... in test execution: - if tc.expectedMessagePart != "" && condition.Message != tc.expectedMessagePart { - t.Errorf("Expected message to contain %q, got %q", tc.expectedMessagePart, condition.Message) + if tc.expectedMessage != "" && condition.Message != tc.expectedMessage { + t.Errorf("Expected message %q, got %q", tc.expectedMessage, condition.Message) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
control-plane-operator/controllers/healthcheck/aws.go(1 hunks)control-plane-operator/controllers/healthcheck/aws_test.go(1 hunks)
🔇 Additional comments (2)
control-plane-operator/controllers/healthcheck/aws.go (2)
18-32: LGTM! Correct handling of KAS unavailability.The pre-check logic correctly prevents validation attempts when KAS is unavailable and appropriately sets the condition to Unknown instead of False. This prevents misleading alerts when validation cannot be performed.
34-46: LGTM! Proper handling of nil EC2 client.The EC2 client nil check now correctly sets the condition to Unknown when the client is unavailable (e.g., due to token minting failure) and returns early, consistent with the KAS availability check above.
sdminonne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a nit
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: muraee, sdminonne The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f57f537 to
940d047
Compare
|
@coderabbitai Thanks for the detailed review! I've addressed both nitpick suggestions:
Changes pushed in commit 940d047. All tests pass. AI-assisted response via Claude Code |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 3944 @muraee Thank you for addressing the suggestions so promptly! 🙏 I've verified the changes in the test file, and everything looks excellent:
The test coverage is now comprehensive, checking all the expected messages across all scenarios. Great work on improving the test clarity and completeness! |
sdminonne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple question otherwise /lgtm
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
control-plane-operator/controllers/healthcheck/aws_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
control-plane-operator/controllers/healthcheck/aws_test.go
🔇 Additional comments (3)
control-plane-operator/controllers/healthcheck/aws_test.go (3)
1-16: LGTM! Well-documented test setup.The imports are appropriate and the test function is clearly documented, explaining its purpose of validating condition logic when KAS is unavailable.
17-66: LGTM! Comprehensive test case coverage.The test cases thoroughly cover the different KAS availability scenarios (missing, False, Unknown, and available), with clear expected values for status, reason, and message. The test structure aligns well with the PR objectives.
96-116: LGTM! Thorough condition verification.The test properly verifies all aspects of the ValidAWSIdentityProvider condition: existence, status, reason, message, and observedGeneration. The use of
t.Fatalat line 99 is appropriate for subtests where subsequent assertions depend on the condition being non-nil.
|
/retest-required |
… unavailable The ValidAWSIdentityProvider condition was incorrectly showing as False when the Kube API Server (KAS) is not available. This created misleading signals during cluster provisioning or when API server pods are down. The AWS identity provider validation requires minting tokens via KAS. When KAS is not available, we cannot determine if the AWS IDP is valid or not. Therefore, the condition should be set to Unknown rather than False. This change checks the KubeAPIServerAvailable condition before attempting AWS validation. If KAS is not available or the EC2 client cannot be created, the condition is set to Unknown with StatusUnknownReason instead of being left in an incorrect False state. Fixes: OCPBUGS-63353 🤖 Generated with [Claude Code](https://claude.com/claude-code) via /jira:solve OCPBUGS-63353 Co-Authored-By: Claude <[email protected]>
940d047 to
8d1359b
Compare
…ogic Add unit tests to validate the condition setting logic when KAS is not available. These tests ensure that the ValidAWSIdentityProvider condition is correctly set to Unknown when validation cannot be performed. Test cases cover: - KAS condition missing - KAS condition False - KAS condition Unknown - KAS condition True (proceeds to check EC2 client) The tests validate that the condition status, reason, and message are set correctly for each scenario, ensuring the fix for OCPBUGS-63353 works as expected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) via /jira:solve OCPBUGS-63353 Co-Authored-By: Claude <[email protected]>
8d1359b to
09d1e48
Compare
|
/retest-required |
|
/lgtm |
|
/retest-required |
|
/verified later |
|
@muraee: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by unit-test |
|
@muraee: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override "e2e-aks" |
|
@muraee: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override "ci/prow/e2e-aks" |
|
@muraee: Overrode contexts on behalf of muraee: ci/prow/e2e-aks DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@muraee: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@muraee: Jira Issue Verification Checks: Jira Issue OCPBUGS-63353 Jira Issue OCPBUGS-63353 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.21.0-0.nightly-2025-11-22-193140 |
What this PR does / why we need it:
This PR fixes the
ValidAWSIdentityProvidercondition to showUnknownstatus instead ofFalsewhen the Kube API Server (KAS) is not available.The AWS identity provider validation requires minting tokens via KAS. When KAS is not up (during cluster provisioning or when API server pods are down), the validation cannot be performed. Previously, the condition would show as
Falsewith reasonInvalidIdentityProviderand messageWebIdentityErr, which was misleading since the actual validation never occurred.Changes:
KubeAPIServerAvailablecondition isTruenil(token minting failed)This ensures accurate status reporting and prevents misleading alerts for service providers.
Which issue(s) this PR fixes:
Fixes OCPBUGS-63353
Special notes for your reviewer:
hypershift_cluster_invalid_aws_credshandling will be addressed in a separate PRChecklist:
🤖 Generated with Claude Code via
/jira:solve OCPBUGS-63353