Skip to content

Conversation

@tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Aug 14, 2025

When they enable runLauncherAsWorker, the launcher plays a worker role as well.
In that case, we will face the deadlock between ready for pod vs ready for endpoint in the Launcher. Eventually, MPIJob will fail due to an unready endpoint.

This publishNotReadyAddresses allows us to expose the endpoint immediately, and scripts inside of Launcher can confirm if the Launcher is ready.

@tenzen-y
Copy link
Member Author

/assign @terrytangyuan

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Aug 14, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit e90c176 into kubeflow:master Aug 14, 2025
19 of 21 checks passed
@tenzen-y tenzen-y deleted the enable-publishNotReadyAddresses branch August 14, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants