Skip to content

GCP-198: add PSC endpoint controller for customer-side management#7603

Merged
openshift-merge-bot[bot] merged 11 commits into
openshift:mainfrom
cristianoveiga:GCP-198-add-psc-endpoint-controller
Feb 4, 2026
Merged

GCP-198: add PSC endpoint controller for customer-side management#7603
openshift-merge-bot[bot] merged 11 commits into
openshift:mainfrom
cristianoveiga:GCP-198-add-psc-endpoint-controller

Conversation

@cristianoveiga
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Adds a new PSC endpoint controller in the control-plane-operator that manages Private Service Connect (PSC) endpoints in the customer's GCP project. This is the customer-side counterpart to the management-side PSC controller that creates Service Attachments.

Key features:

  • Reserves static internal IP addresses for PSC endpoints in the customer's subnet
  • Creates ForwardingRules targeting the management-side Service Attachment
  • Handles cleanup of GCP resources on deletion with proper finalizers
  • Waits for WIF token availability before adding finalizers (prevents cleanup failures)
  • Implements 30-second timeout on all GCP API calls to prevent hung reconcilers
  • Propagates GCPEndpointAvailable and GCPServiceAttachmentAvailable conditions to HostedCluster status

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/GCP-198

Special notes for your reviewer:

  • DNS implementation is intentionally excluded and will be handled separately in GCP-199
  • The controller uses WIF (Workload Identity Federation) credentials to authenticate with the customer's GCP project
  • Timeout logic matches the pattern used in the management-side PSC controller (hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go)
  • Tested end-to-end in the gcp-hcp-pilot branch

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Add a new controller in the control-plane-operator that manages PSC
endpoints in the customer's GCP project. This controller:

- Reserves static internal IP addresses for PSC endpoints
- Creates ForwardingRules targeting the management-side Service Attachment
- Handles cleanup of GCP resources on deletion with proper finalizers
- Waits for WIF token availability before adding finalizers
- Implements proper error handling with GCP-specific retry logic
- Adds 30-second timeout to all GCP API calls to prevent hung reconcilers
- Propagates GCPEndpointAvailable condition to HostedCluster status

The controller runs in the hosted control plane namespace and uses
Workload Identity Federation credentials to authenticate with the
customer's GCP project.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 29, 2026

Walkthrough

Adds a new GCP Private Service Connect (PSC) endpoint controller with tests, wires it into the control-plane manager, and updates HostedCluster reconciliation to aggregate PSC-related status conditions; a duplicate test declaration was introduced in the patch.

Changes

Cohort / File(s) Summary
GCP PSC Endpoint Controller
control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
New GCPPrivateServiceConnectReconciler with SetupWithManager and Reconcile. Implements GCP client init (WIF/service account), finalizer lifecycle and deletion cleanup for PSC endpoints and reserved IPs, service attachment readiness checks, idempotent IP reservation, ForwardingRule create/update, status condition updates, and GCP API error handling/retries.
GCP PSC Unit Tests
control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller_test.go
New tests for endpoint/IP name construction, network/subnet/address URL builders, service attachment readiness, not-found detection, and name length/uniqueness/consistency across clusters.
Manager wiring (control-plane-operator)
control-plane-operator/main.go
Wires the new PSC controller into the manager for GCP private platform; renames observer controller variable and standardizes setup error logging and exit handling.
HostedCluster status aggregation
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Adds computeGCPPSCCondition and lists GCPPrivateServiceConnect resources to derive aggregated GCPEndpointAvailable and GCPServiceAttachmentAvailable conditions for HostedCluster status.
HostedCluster tests (aggregation)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
Adds TestComputeGCPPSCCondition covering true/false/unknown aggregation scenarios. Patch includes a duplicate identical test declaration that should be removed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jan 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/gcp PR/issue for GCP (GCPPlatform) platform and removed do-not-merge/needs-area labels Jan 29, 2026
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristianoveiga cristianoveiga changed the title feat(gcp): add PSC endpoint controller for customer-side management GCP-198: add PSC endpoint controller for customer-side management Jan 29, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 29, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jan 29, 2026

@cristianoveiga: This pull request references GCP-198 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds a new PSC endpoint controller in the control-plane-operator that manages Private Service Connect (PSC) endpoints in the customer's GCP project. This is the customer-side counterpart to the management-side PSC controller that creates Service Attachments.

Key features:

  • Reserves static internal IP addresses for PSC endpoints in the customer's subnet
  • Creates ForwardingRules targeting the management-side Service Attachment
  • Handles cleanup of GCP resources on deletion with proper finalizers
  • Waits for WIF token availability before adding finalizers (prevents cleanup failures)
  • Implements 30-second timeout on all GCP API calls to prevent hung reconcilers
  • Propagates GCPEndpointAvailable and GCPServiceAttachmentAvailable conditions to HostedCluster status

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/GCP-198

Special notes for your reviewer:

  • DNS implementation is intentionally excluded and will be handled separately in GCP-199
  • The controller uses WIF (Workload Identity Federation) credentials to authenticate with the customer's GCP project
  • Timeout logic matches the pattern used in the management-side PSC controller (hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go)
  • Tested end-to-end in the gcp-hcp-pilot branch

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 354-362: The code uses
hcp.Spec.Platform.GCP.NetworkConfig.Network.Name directly when building the
ForwardingRule; add the same nil checks used for the subnet to validate
Platform, GCP, NetworkConfig and Network before accessing Network.Name (similar
to the existing ensureIPAddress/subnet checks), and if any are nil return an
appropriate error (or requeue) rather than dereferencing; locate the usage in
psc_endpoint_controller.go where endpoint is constructed and call
r.constructNetworkURL only after confirming
hcp.Spec.Platform.GCP.NetworkConfig.Network != nil (and parent fields present).
- Around line 252-256: The code reads
hcp.Spec.Platform.GCP.NetworkConfig.PrivateServiceConnectSubnet.Name into
pscSubnet without checking for nil; update the logic in the block that sets
pscSubnet (and related code in setFromHCP if applicable) to first verify that
hcp.Spec.Platform.GCP.NetworkConfig != nil and
hcp.Spec.Platform.GCP.NetworkConfig.PrivateServiceConnectSubnet != nil, and
return a clear error (or handle defaulting) if either is nil before accessing
.Name; reference the symbols hcp, setFromHCP, NetworkConfig,
PrivateServiceConnectSubnet, and pscSubnet to locate and fix the checks.
- Around line 138-166: When handling deletion of gcpPSC (the block checking
gcpPSC.DeletionTimestamp and pscEndpointFinalizer), ensure the GCP client
builder is initialized before calling r.gcpClientBuilder.getClient(ctx): call
r.initializeWithHCP(ctx, log) (or the method that seeds the builder from the
HostedControlPlane) at the start of the deletion branch, handle its error
(return or requeue if initialization fails), then call
r.gcpClientBuilder.getClient(ctx) and proceed to r.reconcileDelete(...) as
before; this prevents "client not initialized" errors and avoids orphaning GCP
resources if the controller restarts during deletion.
🧹 Nitpick comments (2)
control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go (2)

104-119: Controller setup works but has unconventional client assignment.

Setting r.Client = mgr.GetClient() at line 116 after Build() works but is atypical. Usually the client is injected when the reconciler is constructed in main.go (as done with other controllers in this codebase). Consider aligning with the pattern used by GCPPrivateServiceObserver where the client is passed during struct initialization.


554-581: Unused parameters in InitCustomerGCPClient.

The customerProject and controlPlaneOperatorGSA parameters are never used in the function body. The function relies solely on GOOGLE_APPLICATION_CREDENTIALS. Either remove these parameters or add a TODO explaining their intended future use.

♻️ Option 1: Remove unused parameters
-func InitCustomerGCPClient(ctx context.Context, customerProject string, controlPlaneOperatorGSA string) (*compute.Service, error) {
+func InitCustomerGCPClient(ctx context.Context) (*compute.Service, error) {

Then update the caller at line 94:

-	return InitCustomerGCPClient(ctx, b.customerProject, b.controlPlaneOperatorGSA)
+	return InitCustomerGCPClient(ctx)
♻️ Option 2: Add TODO for future use
 func InitCustomerGCPClient(ctx context.Context, customerProject string, controlPlaneOperatorGSA string) (*compute.Service, error) {
+	// TODO: customerProject and controlPlaneOperatorGSA will be used for explicit
+	// credential configuration when RolesRef support is added. Currently unused
+	// as we rely on GOOGLE_APPLICATION_CREDENTIALS.
 	credentialsFile := os.Getenv("GOOGLE_APPLICATION_CREDENTIALS")

Handle the case where the controller restarts during deletion. The
in-memory client builder state is lost on restart, so we need to
reinitialize from HCP before attempting cleanup.

If the HCP is already deleted, log a debug message and continue with
cleanup attempt (finalizer will still be removed following AWS pattern).

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@cristianoveiga cristianoveiga marked this pull request as ready for review January 29, 2026 18:54
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 29, 2026
@openshift-ci openshift-ci Bot requested review from bryan-cox and jimdaga January 29, 2026 18:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 97-103: The shared gcpClientBuilder stored on
GCPPrivateServiceConnectReconciler causes races across concurrent reconciles;
remove the reconciler-level gcpClientBuilder field and instead construct a fresh
gcpClientBuilder inside the reconcile flow (e.g., at the start of Reconcile or
right after getHostedControlPlane returns) so each reconciliation uses its own
builder tied to that HCP/project; update any uses of the
reconciler.gcpClientBuilder (and its initialization guard) to use the local
builder variable and ensure both normal and deletion paths use the per-request
builder.

@bryan-cox
Copy link
Copy Markdown
Member

/unassign @bryan-cox

Reorder imports to satisfy gci linter: stdlib, third-party, then
local project imports.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@cblecker
Copy link
Copy Markdown
Member

/retest

@bryan-cox
Copy link
Copy Markdown
Member

/uncc @bryan-cox

@openshift-ci openshift-ci Bot removed the request for review from bryan-cox January 30, 2026 01:15
Copy link
Copy Markdown
Member

@cblecker cblecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary: GCP PSC Endpoint Controller

This PR adds a well-structured PSC endpoint controller following established patterns from the AWS PrivateLink implementation. The test coverage for helper functions is good, and the condition propagation to HostedCluster is cleanly implemented.

I've left inline comments comparing specific patterns to the AWS implementation in control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go. A few items warrant attention before merging.

Open Questions:

  1. Mutex protection (line 68-73): With MaxConcurrentReconciles: 10, is there a reason to omit mutex protection on gcpClientBuilder? AWS uses sync.Mutex for the equivalent struct.

  2. HCP watching (line 111-118): AWS watches HCP for endpoint access changes. Is this intentionally omitted for GCP, or should it be added?

  3. Periodic requeue (line 236): AWS requeues every 5 minutes to catch out-of-band changes. Should GCP do the same for drift detection?

  4. Endpoint adoption: AWS checks for existing VPC endpoints and adopts them (lines 617-646). Should GCP handle adoption of existing ForwardingRules/addresses?

  5. TODO tracking (line 87): Is there a Jira for completing the controlPlaneOperatorGSA extraction?

Comment on lines +68 to +73

func (b *gcpClientBuilder) initializeWithHCP(hcp *hyperv1.HostedControlPlane) {
if !b.initialized {
b.setFromHCP(hcp)
b.initialized = true
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition risk with concurrent reconciles

The gcpClientBuilder struct is accessed by multiple goroutines (MaxConcurrentReconciles: 10) but lacks mutex protection.

The AWS PrivateLink controller's equivalent clientBuilder uses a sync.Mutex:

// control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go:199-205
type clientBuilder struct {
    mu                             sync.Mutex
    initialized                    bool
    // ...
}

Consider adding mutex protection to prevent race conditions when multiple reconciles call initializeWithHCP() concurrently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS implementation has some historical patterns that may not be optimal today. As noted in a previous review (PR #7207), the AWS controller used separate informers because the manager cache was previously scoped to the cluster - that's no longer the case since the cache is now scoped to the control plane namespace.

Similarly, the mutex in AWS's clientBuilder may be defensive for patterns that don't apply to our simpler GCP implementation:

  1. The CPO runs per-HCP namespace with a single GCPPrivateServiceConnect CR
  2. The GCP project and region are immutable fields
  3. Even concurrent reconciles for the same CR would set identical values

I would like to keep it simple without mutex for now. We can add it if the design evolves to require it.

Comment on lines +476 to +479
// Use service attachment name as base - it's unique and within GCP naming limits
return fmt.Sprintf("%s-ip", gcpPSC.Status.ServiceAttachmentName)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential invalid resource names if ServiceAttachmentName is empty

If gcpPSC.Status.ServiceAttachmentName is empty (before management-side populates it), this returns -endpoint and constructIPAddressName returns -ip, which would likely fail GCP API validation.

While isServiceAttachmentReady() should prevent this path, adding defensive validation here would be safer:

func (r *GCPPrivateServiceConnectReconciler) constructEndpointName(gcpPSC *hyperv1.GCPPrivateServiceConnect) string {
    if gcpPSC.Status.ServiceAttachmentName == "" {
        return "" // or panic, depending on how you want to handle this
    }
    return fmt.Sprintf("%s-endpoint", gcpPSC.Status.ServiceAttachmentName)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I considered this, but I think we can keep it simple here:

  1. The management-side controller populates ServiceAttachmentURI and ServiceAttachmentName atomically in the same status update
  2. isServiceAttachmentReady() blocks the normal reconciliation flow until the URI and condition are set
  3. During deletion, if ServiceAttachmentName is somehow empty, the CR was never properly initialized - indicating a larger issue that should be fixed at the source

Adding defensive validation here would mask a root cause bug in the management-side controller. If this ever fails, the GCP API error (-endpoint invalid name) would clearly point to the issue for debugging.

Let me know if you feel strongly about adding it - happy to reconsider.

Comment on lines +111 to +118
}).
Build(r)
if err != nil {
return fmt.Errorf("failed setting up with a controller manager: %w", err)
}
r.Client = mgr.GetClient()

return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is HCP watching intentionally omitted?

AWS watches the HostedControlPlane for endpoint access changes:

// control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go:319
Watches(&hyperv1.HostedControlPlane{}, handler.Funcs{UpdateFunc: r.enqueueOnAccessChange(mgr)})

This allows re-reconciling when certain HCP fields change. Is this not needed for GCP, or should it be added for parity?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally omitted for now. The HCP watch in AWS is used to immediately react when EndpointAccess changes (to create/remove ExternalName services for private DNS routing).

I think it makes more sense add the HCP watch along with the DNS handling logic in the next PR (for https://issues.redhat.com/browse/GCP-199. I added a note on the Jira ticket about that)

Remove the controlPlaneOperatorGSA field from gcpClientBuilder and
the corresponding parameter from InitCustomerGCPClient as it was
never used.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Add periodic reconciliation every 5 minutes to detect out-of-band
changes to GCP resources. This matches the AWS private link controller
pattern and ensures the controller can identify and report on manual
or external modifications to PSC endpoints.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Check the error return from logr.FromContext to maintain consistency
with the AWS private link controller pattern.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 307-310: The current checks only requeue when op.Status ==
"RUNNING", letting PENDING operations fall through and cause premature GETs;
update the three locations that inspect the operation (the blocks that log "IP
reservation in progress" and requeue using ctrl.Result{RequeueAfter: ...}) to
instead treat any non-DONE status as in-progress by replacing the equality check
with a non-DONE check (e.g., if op.Status != "DONE") so both "PENDING" and
"RUNNING" requeue; apply this change to each spot where the local op variable
and that log/requeue pattern appear in psc_endpoint_controller.go.
- Around line 179-188: The finalizer addition unconditionally waits for
isWIFTokenAccessible() which blocks when service-account-key credentials are
used; modify the logic so the WIF token gate runs only when WIF credentials are
expected. Update the block that adds pscEndpointFinalizer for gcpPSC to first
check whether WIF is in use (introduce a small predicate like isWIFEnabled() or
reuse credential-detection from InitCustomerGCPClient), and only call
isWIFTokenAccessible() and requeue when that predicate is true; otherwise add
the finalizer immediately and call r.Update(ctx, gcpPSC). Ensure you reference
and update the same control flow around controllerutil.ContainsFinalizer,
pscEndpointFinalizer, and isWIFTokenAccessible() so the check is conditional on
chosen credential type.

Update InitCustomerGCPClient comments to explicitly state that this
controller requires Workload Identity Federation (WIF) credentials
and does not support service account keys. This aligns with the
finalizer gate that waits for WIF token availability.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 458-461: The deletion path currently only treats op.Status ==
"RUNNING" as in-progress which misses other non-completed states like PENDING;
change the check in the deletion handling (where op and op.Status are inspected
and the log "PSC endpoint deletion in progress" is emitted) to treat any
non-DONE status as not completed (e.g., op != nil && op.Status != "DONE") so the
controller waits for actual completion before proceeding and still returns
false, nil while waiting.

GCP operations can be in PENDING, RUNNING, or DONE states. Previously
the code only checked for RUNNING which could miss PENDING operations.
Now we check for any non-DONE status to properly wait for operation
completion before proceeding.

Also added status field to log messages for better debugging.

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 404-436: In updateStatusFromEndpoint, don't mark available based
solely on non-empty endpoint fields; instead compare endpoint.Target and
endpoint.IPAddress against the expected values stored on the CR (use
gcpPSC.Spec.Target (or gcpPSC.Spec.ServiceAttachment) and
gcpPSC.Spec.IP/ExpectedIP as the canonical source) and only set
GCPEndpointAvailable = True when they exactly match; if they differ set
ConditionFalse with a message that includes expected vs actual values (and keep
the existing Status().Patch and requeue behavior); if the CR doesn't specify
expected values, fall back to the current non-empty check.
- Around line 438-485: The reconcileDelete function must check GCP operation
errors and wait for async IP deletion: after ForwardingRules.Delete in
reconcileDelete, when op != nil and op.Status == "DONE" inspect op.Error and if
non-nil return an error; if op.Status != "DONE" return false to retry. For the
Addresses.Delete call, capture the returned operation (e.g., ipOp), then poll
the regional operation via
customerGCPClient.RegionOperations.Get(customerProject, region, ipOp.Name) with
a timeout (gcpAPITimeout) until its Status == "DONE" (return false while
pending) and then check that operation's Error field and return an error if
present. Use the existing symbols constructEndpointName, constructIPAddressName,
reconcileDelete, gcpAPITimeout, isNotFoundError to locate and update the code
paths.

Follow reviewer suggestion to properly handle GCP async deletion:
- Check op.Status != "DONE" to detect in-progress operations
- Check op.Error for operation failures (DONE doesn't guarantee success)
- Verify resource is actually gone via GET after successful operation
- Handle "already deleted" case (404 on delete) gracefully

Both ForwardingRule and Address deletions now:
1. Delete resource, get operation
2. If 404: already deleted, continue
3. If op.Status != "DONE": return false to retry
4. If op.Error != nil: return error with details
5. GET to verify resource is gone
6. If still exists: return false to retry

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 307-311: After detecting op.Status == "DONE" in the IP reservation
branch, also check op.Error like the deletion path does: if op.Error is non-nil
(or contains entries), log the error details and return an appropriate
error/result instead of proceeding; update the block around the existing
op.Status check (the branch that currently logs "IP reservation in progress"
using op.Name/op.Status) to perform the same op.Error inspection and handling
used by the deletion flow so failed reservations are not treated as successful.

Add op.Error checks after op.Status == "DONE" for creation operations
to match the deletion path. GCP operations can complete with DONE status
while still having errors.

Affected paths:
- IP reservation (Addresses.Insert)
- PSC endpoint creation (ForwardingRules.Insert)

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go`:
- Around line 232-247: The isServiceAttachmentReady function only checks
ServiceAttachmentURI but must also validate ServiceAttachmentName because helper
functions constructEndpointName and constructIPAddressName rely on it; update
GCPPrivateServiceConnectReconciler.isServiceAttachmentReady to return false if
either gcpPSC.Status.ServiceAttachmentURI or gcpPSC.Status.ServiceAttachmentName
is empty, then continue the existing condition checks for
GCPServiceAttachmentAvailable.

Add defensive check for ServiceAttachmentName in addition to
ServiceAttachmentURI. Both fields are required:
- URI: used as the PSC endpoint target
- Name: used by constructEndpointName and constructIPAddressName

Without this check, if Name is empty we would create GCP resources
with malformed names like "-endpoint" and "-ip".

Signed-off-by: Cristiano Veiga <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@cblecker
Copy link
Copy Markdown
Member

/lgtm
/retest-required

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2026
@cblecker
Copy link
Copy Markdown
Member

/retest-required

1 similar comment
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/retest-required

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/verified later @cristianoveiga

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Feb 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@cristianoveiga: This PR has been marked to be verified later by @cristianoveiga.

Details

In response to this:

/verified later @cristianoveiga

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Feb 4, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cristianoveiga, csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 4, 2026

@cristianoveiga: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 5f9548b into openshift:main Feb 4, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/gcp PR/issue for GCP (GCPPlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants