GCP-196: Implement GCPPrivateServiceObserver controller#7207
Conversation
|
@cristianoveiga: This pull request references GCP-196 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughAdds LoadBalancerIP (spec) and EndpointIP (status) with IP validation; makes ForwardingRuleName optional; updates CRD printer column to GCPEndpointAvailable; introduces GCPPrivateServiceObserver and tests; adds client apply helper; registers observer at operator startup; updates generated CRDs and API docs. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
Add required LoadBalancerIP field to GCPPrivateServiceConnectSpec to store the Internal Load Balancer IP address extracted by the observer controller from Kubernetes service status. Signed-off-by: Cristiano Veiga <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)
…hensive tests Add complete implementation of GCP-196 observer controller that: - Watches private-router service with Internal Load Balancer annotation - Extracts LoadBalancer IP from service status when ready - Creates GCPPrivateServiceConnect CR with proper OwnerReference lifecycle - Uses GCP project from HostedControlPlane as consumer accept list - Follows AWS PrivateServiceObserver pattern for architecture consistency Include comprehensive unit test coverage: - Controller name generation utilities - Consumer accept list extraction from HCP platform spec - Service LoadBalancer validation (Internal vs External types) - LoadBalancer IP extraction logic for various service states - HostedControlPlane owner reference detection and parsing Signed-off-by: Cristiano Veiga <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)
74124dd to
2371929
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
cmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-TechPreviewNoUpgrade.crd.yaml (1)
97-100: Same initialization concern: loadBalancerIP marked as required.This manifest has the same issue as the GCPPlatform.yaml manifest where
loadBalancerIPis required. Ensure consistency across all CRD variants and verify that the initialization flow supports required fields populated by observers.cmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-CustomNoUpgrade.crd.yaml (1)
97-100: Consistent with other CRD manifests but same initialization issue.This manifest mirrors the changes in the other CRD files. The same concern about
loadBalancerIPbeing required applies here.
🧹 Nitpick comments (4)
control-plane-operator/controllers/gcpprivateserviceconnect/observer.go (2)
50-65: Consider using the newer sets API.The code uses
sets.NewString()which may be deprecated in newer versions of k8s.io/apimachinery. Consider usingsets.New[string]()instead for future compatibility.func nameMapper(names []string) handler.MapFunc { - nameSet := sets.NewString(names...) + nameSet := sets.New(names...) return func(ctx context.Context, obj client.Object) []reconcile.Request {
129-134: Document the single ingress assumption.The code assumes the LoadBalancer has exactly one ingress entry and takes the first one (
Ingress[0].IP). While this is likely correct for GCP Internal LoadBalancers, consider adding a comment explaining this assumption or validating that there's exactly one ingress entry.// Extract LoadBalancer IP + // GCP Internal LoadBalancers have exactly one ingress entry loadBalancerIP := svc.Status.LoadBalancer.Ingress[0].IPdocs/content/reference/api.md (2)
166-176: Clarify controller-managed semantics and IP format for GCP PSC fields.
- loadBalancerIP: state whether it’s IPv4-only or IPv4/IPv6 and explicitly mark as controller-populated/read-only so users don’t try to set it.
- forwardingRuleName (Optional): likewise, note it’s set by the reconciler and should not be user-provided.
Since this page is generated, please adjust the Go type comments in api/hypershift/v1beta1/gcpprivateserviceconnect_types.go so the generated docs reflect this. Example:
-// loadBalancerIP is the IP address of the Internal Load Balancer -// Populated by the observer from service status +// loadBalancerIP is the IP address of the internal load balancer (IPv4-only/IPv4+IPv6 — clarify actual support). +// Managed by the observer and populated from Service status. Users should not set this field. +// If this resource is operator-created, this field is required on creation by the operator, not by users. -// forwardingRuleName is the name of the Internal Load Balancer forwarding rule -// Populated by the reconciler via GCP API lookup +// forwardingRuleName is the name of the internal load balancer forwarding rule. +// Managed by the reconciler via GCP API lookup. Users should not set this field.
- Confirm the intended IP family support and validations for loadBalancerIP.
- Confirm that the CR is created by the operator (so a required loadBalancerIP won’t block user creation). If users may create it, consider not marking it required in the CRD.
Also applies to: 184-186
6014-6024: Mirror the same clarifications in the Spec section.Repeat the controller-managed/read-only and IP family notes for loadBalancerIP and forwardingRuleName here by updating the source comments so both generated locations stay consistent.
Also applies to: 6032-6034
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/gcpprivateserviceconnect_types.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (9)
api/hypershift/v1beta1/gcpprivateserviceconnect_types.go(1 hunks)api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/gcpprivateserviceconnects.hypershift.openshift.io/GCPPlatform.yaml(2 hunks)client/applyconfiguration/hypershift/v1beta1/gcpprivateserviceconnectspec.go(2 hunks)cmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-CustomNoUpgrade.crd.yaml(2 hunks)cmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-TechPreviewNoUpgrade.crd.yaml(2 hunks)control-plane-operator/controllers/gcpprivateserviceconnect/observer.go(1 hunks)control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.go(1 hunks)control-plane-operator/main.go(2 hunks)docs/content/reference/api.md(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.goapi/hypershift/v1beta1/gcpprivateserviceconnect_types.gocmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-TechPreviewNoUpgrade.crd.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/gcpprivateserviceconnects.hypershift.openshift.io/GCPPlatform.yamldocs/content/reference/api.mdclient/applyconfiguration/hypershift/v1beta1/gcpprivateserviceconnectspec.gocontrol-plane-operator/main.gocontrol-plane-operator/controllers/gcpprivateserviceconnect/observer.gocmd/install/assets/hypershift-operator/zz_generated.crd-manifests/gcpprivateserviceconnects-CustomNoUpgrade.crd.yaml
🔇 Additional comments (12)
api/hypershift/v1beta1/gcpprivateserviceconnect_types.go (2)
49-54: LGTM!The change to make
ForwardingRuleNameoptional withomitemptyis appropriate, given that it's now populated by the reconciler via GCP API lookup. The documentation clearly indicates the population method, and the validation constraints remain appropriate.
41-47: The concern does not apply—the observer only updates the resource after validating the LoadBalancerIP is available.The code at lines 130–134 checks that the LoadBalancer IP is present in the service status before proceeding. The
reconcileGCPPrivateServiceConnectcall at line 163 only executes after this validation passes. Inside theCreateOrUpdatecallback, theLoadBalancerIPfield is populated (line 190) before the resource is saved, satisfying the+requiredconstraint. There is no "chicken and egg" problem because the observer never attempts to create or update the resource without the required field already being available.client/applyconfiguration/hypershift/v1beta1/gcpprivateserviceconnectspec.go (2)
23-23: LGTM!The generated
LoadBalancerIPfield follows the correct pattern, using a pointer type with the appropriate JSON tag.
35-41: LGTM!The
WithLoadBalancerIPmethod correctly implements the fluent builder pattern, matching the style of otherWith*methods in the file.control-plane-operator/controllers/gcpprivateserviceconnect/observer.go (3)
37-48: LGTM!The observer type definition is well-structured with appropriate fields for client access, logging, configuration, and the upsert provider pattern.
75-102: LGTM!The controller setup follows best practices with namespace-scoped informers, proper error handling, and lifecycle management through the manager.
136-157: LGTM!The OwnerReference validation and HostedControlPlane fetch logic is well-implemented with proper error handling and early exit on deletion.
control-plane-operator/main.go (2)
15-15: LGTM!The import for the new GCP Private Service Connect controller package is correctly placed alongside the similar AWS private link package.
524-539: Verify if GCP should also observe KubeAPIServer private service.The AWS implementation sets up two observers: one for
KubeAPIServerPrivateServiceand one forPrivateRouterService. The GCP implementation only sets up an observer forPrivateRouterService.However, evidence shows that KubeAPIServerPrivateService is created for all private HCPs (including GCP) when using LoadBalancer service strategy. Since the service exists for GCP but isn't observed like in AWS, confirm whether GCP Private Service Connect infrastructure should also observe and reconcile the KubeAPIServer private service, or if the architecture legitimately differs.
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/gcpprivateserviceconnects.hypershift.openshift.io/GCPPlatform.yaml (2)
79-86: LGTM: loadBalancerIP field specification is well-defined.The field includes appropriate validation constraints (IP pattern, length limits) and clear documentation indicating it's populated by the observer.
94-97: No issues identified. The review comment is based on a misunderstanding of the resource creation flow.The concern about a circular dependency does not apply here. The observer validates that
loadBalancerIPis populated from the Service status (lines 133-137 of observer.go) before callingreconcileGCPPrivateServiceConnect. TheCreateOrUpdatemethod executes the mutate closure (which populatesSpec.LoadBalancerIPandSpec.ConsumerAcceptList) before submitting the resource to the API (support/upsert/upsert.go lines 72-76). Therefore, the resource is created with all required fields already set, and makingloadBalancerIPrequired is correct and safe.Likely an incorrect or invalid review comment.
control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.go (1)
30-34: The review comment's concern is incorrect.The test case accepts an empty string, but PrivateRouterService returns a Service with hardcoded Name "private-router", and all production calls to ControllerName pass either KubeAPIServerPrivateServiceName or manifests.PrivateRouterService("").Name, which are both non-empty values. The empty string test case is defensive edge-case testing that cannot occur in production, so "-observer" will never be generated at runtime. No validation or fixes are needed.
Likely an incorrect or invalid review comment.
|
@cristianoveiga: This pull request references GCP-196 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@cristianoveiga: This pull request references GCP-196 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Remove test cases with empty project and non-GCP platform as project is mandatory for GCP PSC observer functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Extract isInternalLoadBalancer and extractLoadBalancerIP functions from inline logic in observer controller - Refactor tests to call actual validation functions instead of duplicating logic inline - This ensures tests verify the production implementation and will catch changes to controller logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
/uncc @bryan-cox |
- Extract extractHostedControlPlaneOwnerName function from inline logic in observer controller - Refactor TestServiceOwnerReferenceValidation to call actual function instead of duplicating owner reference lookup logic - Ensures tests verify production implementation and catch logic changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.go (1)
30-34: Consider whether empty service name is a valid scenario.The test documents that an empty service name produces "-observer". While this may be intentional for edge case handling, in practice an empty service name would be rejected by Kubernetes API validation. Consider whether this test case adds value or if it could be removed to keep tests focused on realistic scenarios.
control-plane-operator/controllers/gcpprivateserviceconnect/observer.go (1)
189-198: Consider defensive handling if GCP spec could be nil.If
hcp.Spec.Platform.GCPis nil (line 191), the function returns nil, which would be set asConsumerAcceptList(line 180) and might fail CRD validation if the schema requiresMinItems=1. While you noted that the project is required at the API level, a defensive check or comment documenting the assumption would clarify the contract.func getConsumerAcceptList(hcp *hyperv1.HostedControlPlane) []string { + // GCP platform spec is guaranteed to be non-nil for GCP HostedControlPlanes if hcp.Spec.Platform.GCP == nil { return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
control-plane-operator/controllers/gcpprivateserviceconnect/observer.go(1 hunks)control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.gocontrol-plane-operator/controllers/gcpprivateserviceconnect/observer.go
🔇 Additional comments (5)
control-plane-operator/controllers/gcpprivateserviceconnect/observer_test.go (2)
201-211: Excellent improvement - tests now call actual implementation.The tests now properly call the production helper functions (
isInternalLoadBalancer,extractLoadBalancerIP) rather than duplicating the validation logic inline. This ensures that changes to the controller logic will be caught by the tests and provides confidence in the actual implementation.
267-271: Good refactoring - owner reference extraction now uses actual function.The test correctly calls
extractHostedControlPlaneOwnerNamefrom the observer implementation, ensuring that the test validates the production code path rather than reimplementing the logic.control-plane-operator/controllers/gcpprivateserviceconnect/observer.go (3)
76-103: LGTM - Standard controller setup pattern.The controller setup follows Kubernetes best practices by using a shared informer with namespace filtering and proper event handling. The use of
namedResourceHandlerensures only the specified service triggers reconciliation.
105-158: Well-structured reconciliation loop with clear validation steps.The reconcile logic follows a clear progression: fetch service → validate internal LB type → validate IP readiness → extract HCP owner → fetch HCP → derive consumer list → reconcile CR. Early returns for transient conditions (IP not ready, deleted HCP) are appropriate since the watch will trigger re-reconciliation on updates.
200-229: Clean helper functions enable testability.Extracting these validation and extraction functions (
isInternalLoadBalancer,extractLoadBalancerIP,extractHostedControlPlaneOwnerName) into standalone functions enables the comprehensive unit tests in observer_test.go and follows good separation of concerns.
|
/retest-required |
| } | ||
|
|
||
| if hcp.Spec.Platform.Type == hyperv1.GCPPlatform && util.IsPrivateHCP(hcp) && mgmtClusterCaps.Has(capabilities.CapabilityRoute) { | ||
| controllerName := "GCPPrivateServiceObserver" |
There was a problem hiding this comment.
Would we not want to name this controller "PrivateKubeAPIServerServiceObserver" like the AWS one? Only one of them will be invoked because the platform type is different. Will we need a second one for the ingress later?
There was a problem hiding this comment.
My understanding is that we only need one Observer controller and perhaps it's a good idea to keep its name generic. Here is why:
Looking at the AWS implementation, I see that there are two observer controllers:
- PrivateKubeAPIServerServiceObserver
- PrivateIngressServiceObserver
But only the second one is actually creating a awsendpointservice CR because (the first doesn't have a load balancer associated to it):
Hence, this CR seems to be used for both. The actual configuration of PrivateIngress and happens in the AWSEndpointServiceReconciler, which we will implement the GCP equivalent in the next card, GCP-197.
Thoughts?
There was a problem hiding this comment.
What if the customer wants public ingress but a private API server? Is that why there is two for AWS?
There was a problem hiding this comment.
That should be possible, but it doesn't seem it's to be controlled by the PrivateKubeAPIServerServiceObserver. Instead, this is achieved by the combination of the endpointAccess and the servicePublishingStrategy in the HostedCluster spec.
I didn't find any kube-apiserver-private service in our ROSA Clusters (even in private clusters) - that's why I think the first observer might not be required in our GCP PSC scenario (the observer that watches the private-router service would be enough).
Please let me know if you want me to rename it to PrivateIngressServiceObserver anyway. Otherwise, we can revisit this in the future (during the implementation of GCP 'fully' Private clusters).
|
|
||
| _, err := r.CreateOrUpdate(ctx, r.Client, gcpPSC, func() error { | ||
| // Set OwnerReference to HostedControlPlane for lifecycle management | ||
| gcpPSC.OwnerReferences = []metav1.OwnerReference{{ |
There was a problem hiding this comment.
Note: it doesn't look like the AWS controller does this. This seems more correct, but could it cause problems?
There was a problem hiding this comment.
Yes, I also noticed that the AWS controller doesn't do that. A explicit link/relationship between the PSC CR and the HCP can help with lookups (and keep things clean during deletion).
but could it cause problems ?
Are you concerned with anything in particular ? I can't think of any issues...
The ultimate protection against deletion will be provided by the finalizers, which I will add in the next PRs when we create the actual GCP resources.
There was a problem hiding this comment.
No specific concerns, but I wanted to ask the question if perhaps there is a reason that there isn't an owner reference (e.g. resources being cleaned up in the wrong order on cluster uninstallation).
There was a problem hiding this comment.
I tested this and I confirmed that the deletion is going to work without any issues.
We will implement a deletion mechanism for the gcpprivateserviceconnect CRs that will be triggered by the hypershift operator, similar to the AWS - see logic here:
hypershift/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Lines 3105 to 3144 in e83d907
The HO basically deletes the awsendpointservice and waits until the resources are cleaned up before deleting the HCPs.
With that, the Owner references will not play a part on the deletion process.
|
|
||
| // Fetch HostedControlPlane for ConsumerAcceptList | ||
| hcp := &hyperv1.HostedControlPlane{} | ||
| if err := r.Get(ctx, types.NamespacedName{Namespace: r.HCPNamespace, Name: hcpName}, hcp); err != nil { |
There was a problem hiding this comment.
AWS does a list, and defends against multiple HCPs in the same namespace. Is that something we want to do too?
There was a problem hiding this comment.
I think the AWS implementation does that because they don't have the owner reference (that we have been discussing in the another thread).
In the GCP approach, we perform a direct lookup (which should perform better anyway).
I can't think in a scenario where we have multiple HCPs in the namespace - and this probably should be enforced at another level.
There was a problem hiding this comment.
No disagreement, but if it isn't enforced at another level today, could this leave us with no enforcement?
There was a problem hiding this comment.
Agreed. We could end up without this protection. But on the other hand, should this be a responsibility of the PSC observer?
I am fine replicating the AWS logic here (as I have done for most of the other parts). I just want to make sure that we keep the logic clean and avoid unnecessary checks.
|
/retest-required |
|
/lgtm |
|
/retest-required |
4 similar comments
|
/retest-required |
|
/retest-required |
|
/retest-required |
|
/retest-required |
|
/override ci/prow/e2e-aks |
|
@csrwng: Overrode contexts on behalf of csrwng: ci/prow/e2e-aks DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/verified later @cristianoveiga |
|
@csrwng: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/hold Revision 4e4b601 was retested 3 times: holding |
|
/hold cancel |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@cristianoveiga: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/area platform/gcp |
What this PR does / why we need it:
This PR implements the GCPPrivateServiceObserver controller which watches private-router Service and creates GCPPrivateServiceConnect CR.
Commits:
Which issue(s) this PR fixes:
GCP-196
Special notes for your reviewer:
Checklist: