Skip to content

[RayJob] Add token authentication support for All mode#4210

Merged
rueian merged 14 commits intoray-project:masterfrom
Future-Outlier:rayjob-sidecar-mode-auth
Nov 19, 2025
Merged

[RayJob] Add token authentication support for All mode#4210
rueian merged 14 commits intoray-project:masterfrom
Future-Outlier:rayjob-sidecar-mode-auth

Conversation

@Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Nov 19, 2025

Why are these changes needed?

This PR is a superset of #4204

  1. support RayJob SidcarMode
  2. support RayJob K8sJob Mode

follow-up:

  1. need some code change in light weight job submmitter (to add token when establishing web socket connection via HTTP request)

How I test it?

cd kuberay/ray-operator
IMG=kuberay/operator:nightly-5 make docker-build
kind load docker-image kuberay/operator:nightly-5
helm install kuberay-operator --set image.repository=kuberay/operator --set image.tag=nightly-5 ../helm-chart/kuberay-operator

I supported 3 examples in this PR.

  1. Sidecar Mode
  2. K8sJob Mode
    a. with cluster selector
    b. without cluster selector
image image

Sidecar Mode

image
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-sample-sidecar-mode
spec:
  # submissionMode specifies how RayJob submits the Ray job to the RayCluster.
  # The default value is "K8sJobMode", meaning RayJob will submit the Ray job via a submitter Kubernetes Job.
  # The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster.
  submissionMode: "SidecarMode"
  entrypoint: python /home/ray/samples/sample_code.py
  # shutdownAfterJobFinishes specifies whether the RayCluster should be deleted after the RayJob finishes. Default is false.
  # shutdownAfterJobFinishes: false

  # ttlSecondsAfterFinished specifies the number of seconds after which the RayCluster will be deleted after the RayJob finishes.
  # ttlSecondsAfterFinished: 10

  # activeDeadlineSeconds is the duration in seconds that the RayJob may be active before
  # KubeRay actively tries to terminate the RayJob; value must be positive integer.
  # activeDeadlineSeconds: 120

  # RuntimeEnvYAML represents the runtime environment configuration provided as a multi-line YAML string.
  # See https://docs.ray.io/en/latest/ray-core/handling-dependencies.html for details.
  # (New in KubeRay version 1.0.)
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0
      - pendulum==2.1.2
    env_vars:
      counter_name: "test_counter"

  # Suspend specifies whether the RayJob controller should create a RayCluster instance.
  # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
  # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluster will be created.
  # suspend: false

  # rayClusterSpec specifies the RayCluster instance to be created by the RayJob controller.
  rayClusterSpec:
    rayVersion: "2.52.0" # should match the Ray version in the image of the containers+
    authOptions:
      mode: "token"
    # Ray head pod template
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams: {}
      #pod template
      template:
        spec:
          containers:
          - name: ray-head
            image: rayproject/ray:nightly-py311-cpu
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265 # Ray dashboard
              name: dashboard
            - containerPort: 10001
              name: client
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "200m"
            volumeMounts:
            - mountPath: /home/ray/samples
              name: code-sample
          volumes:
          # You set volumes at the Pod level, then mount them into containers inside that Pod
          - name: code-sample
            configMap:
              # Provide the name of the ConfigMap you want to mount.
              name: ray-job-code-sample
              # An array of keys from the ConfigMap to create as files
              items:
              - key: sample_code.py
                path: sample_code.py
    workerGroupSpecs:
    # the pod replicas in this group typed worker
    - replicas: 1
      minReplicas: 1
      maxReplicas: 5
      # logical group name, for this called small-group, also can be functional
      groupName: small-group
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams: {}
      #pod template
      template:
        spec:
          containers:
          - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
            image: rayproject/ray:nightly-py311-cpu
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "200m"

  # SubmitterPodTemplate is the template for the pod that will run the `ray job submit` command against the RayCluster.
  # If SubmitterPodTemplate is specified, the first container is assumed to be the submitter container.
  # submitterPodTemplate:
  #   spec:
  #     restartPolicy: Never
  #     containers:
  #     - name: my-custom-rayjob-submitter-pod
  #       image: rayproject/ray:2.46.0
  #       # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field.
  #       # Specifying Command is not recommended.
  #       # command: ["sh", "-c", "ray job submit --address=http://$RAY_DASHBOARD_ADDRESS --submission-id=$RAY_JOB_SUBMISSION_ID -- echo hello world"]


######################Ray code sample#################################
# this sample is from https://docs.ray.io/en/latest/cluster/job-submission.html#quick-start-example
# it is mounted into the container and executed to show the Ray job at work
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-job-code-sample
data:
  sample_code.py: |
    import ray
    import os
    import requests

    ray.init()

    @ray.remote
    class Counter:
        def __init__(self):
            # Used to verify runtimeEnv
            self.name = os.getenv("counter_name")
            assert self.name == "test_counter"
            self.counter = 0

        def inc(self):
            self.counter += 1

        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)

    counter = Counter.remote()

    for _ in range(5):
        ray.get(counter.inc.remote())
        print(ray.get(counter.get_counter.remote()))

    # Verify that the correct runtime env was used for the job.
    assert requests.__version__ == "2.26.0"

k8s job mode (no cluster selector)

job pod's env

image lifecycle: when using `ray job submit`, ray will create a SubmissionClient, read token from env, and add auth token to HTTP headers This is why it works.
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-sample
spec:
  # submissionMode specifies how RayJob submits the Ray job to the RayCluster.
  # The default value is "K8sJobMode", meaning RayJob will submit the Ray job via a submitter Kubernetes Job.
  # The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster.
  submissionMode: "K8sJobMode"
  entrypoint: python /home/ray/samples/sample_code.py
  # shutdownAfterJobFinishes specifies whether the RayCluster should be deleted after the RayJob finishes. Default is false.
  # shutdownAfterJobFinishes: false

  # ttlSecondsAfterFinished specifies the number of seconds after which the RayCluster will be deleted after the RayJob finishes.
  # ttlSecondsAfterFinished: 10

  # activeDeadlineSeconds is the duration in seconds that the RayJob may be active before
  # KubeRay actively tries to terminate the RayJob; value must be positive integer.
  # activeDeadlineSeconds: 120

  # RuntimeEnvYAML represents the runtime environment configuration provided as a multi-line YAML string.
  # See https://docs.ray.io/en/latest/ray-core/handling-dependencies.html for details.
  # (New in KubeRay version 1.0.)
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0
      - pendulum==2.1.2
    env_vars:
      counter_name: "test_counter"

  # Suspend specifies whether the RayJob controller should create a RayCluster instance.
  # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
  # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluster will be created.
  # suspend: false

  # rayClusterSpec specifies the RayCluster instance to be created by the RayJob controller.
  rayClusterSpec:
    rayVersion: "2.52.0" # should match the Ray version in the image of the containers+
    authOptions:
      mode: "token"
    # Ray head pod template
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams: {}
      #pod template
      template:
        spec:
          containers:
          - name: ray-head
            image: rayproject/ray:nightly-py311-cpu
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265 # Ray dashboard
              name: dashboard
            - containerPort: 10001
              name: client
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "200m"
            volumeMounts:
            - mountPath: /home/ray/samples
              name: code-sample
          volumes:
          # You set volumes at the Pod level, then mount them into containers inside that Pod
          - name: code-sample
            configMap:
              # Provide the name of the ConfigMap you want to mount.
              name: ray-job-code-sample
              # An array of keys from the ConfigMap to create as files
              items:
              - key: sample_code.py
                path: sample_code.py
    workerGroupSpecs:
    # the pod replicas in this group typed worker
    - replicas: 1
      minReplicas: 1
      maxReplicas: 5
      # logical group name, for this called small-group, also can be functional
      groupName: small-group
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams: {}
      #pod template
      template:
        spec:
          containers:
          - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
            image: rayproject/ray:nightly-py311-cpu
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "200m"

  # SubmitterPodTemplate is the template for the pod that will run the `ray job submit` command against the RayCluster.
  # If SubmitterPodTemplate is specified, the first container is assumed to be the submitter container.
  # submitterPodTemplate:
  #   spec:
  #     restartPolicy: Never
  #     containers:
  #     - name: my-custom-rayjob-submitter-pod
  #       image: rayproject/ray:2.46.0
  #       # If Command is not specified, the correct command will be supplied at runtime using the RayJob spec `entrypoint` field.
  #       # Specifying Command is not recommended.
  #       # command: ["sh", "-c", "ray job submit --address=http://$RAY_DASHBOARD_ADDRESS --submission-id=$RAY_JOB_SUBMISSION_ID -- echo hello world"]


######################Ray code sample#################################
# this sample is from https://docs.ray.io/en/latest/cluster/job-submission.html#quick-start-example
# it is mounted into the container and executed to show the Ray job at work
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-job-code-sample
data:
  sample_code.py: |
    import ray
    import os
    import requests

    ray.init()

    @ray.remote
    class Counter:
        def __init__(self):
            # Used to verify runtimeEnv
            self.name = os.getenv("counter_name")
            assert self.name == "test_counter"
            self.counter = 0

        def inc(self):
            self.counter += 1

        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)

    counter = Counter.remote()

    for _ in range(5):
        ray.get(counter.inc.remote())
        print(ray.get(counter.get_counter.remote()))

    # Verify that the correct runtime env was used for the job.
    assert requests.__version__ == "2.26.0"

k8s job mode (no cluster selector)

image
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-use-existing-raycluster-auth-token
spec:
  clusterSelector:
    ray.io/cluster: rayjob-sample-spn4v
  entrypoint: python -c "import ray; ray.init(); print(ray.cluster_resources())"
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0

Related issue number

#4203

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier changed the title [RayJob] Sidecar Mode support for Ray token auth [RayJob] K8sJob Mode and SidecarMode support for Ray token auth Nov 19, 2025
@Future-Outlier Future-Outlier changed the title [RayJob] K8sJob Mode and SidecarMode support for Ray token auth [RayJob] Add token authentication support for K8sJob and Sidecar modes Nov 19, 2025
@Future-Outlier Future-Outlier changed the title [RayJob] Add token authentication support for K8sJob and Sidecar modes [RayJob] Add token authentication support for K8sJob and Sidecar mode Nov 19, 2025
Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there reason we're not supporting HttpMode?


func (r *RayDashboardClient) setAuthHeader(req *http.Request) {
if r.authToken != "" {
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", r.authToken))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO to use X-Ray-Auth based on our discussion with Sampan today? Or did you get it working with the standard header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed the TODO in the other file, probably doesn't hurt to add it here too

func GetRayDashboardClientFunc(mgr manager.Manager, useKubernetesProxy bool) func(rayCluster *rayv1.RayCluster, url string) (dashboardclient.RayDashboardClientInterface, error) {
return func(rayCluster *rayv1.RayCluster, url string) (dashboardclient.RayDashboardClientInterface, error) {
dashboardClient := &dashboardclient.RayDashboardClient{}
authToken := ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: var authToken string

dashboardClient.InitClient(&http.Client{
Timeout: 2 * time.Second,
}, "http://"+url)
if rayCluster != nil && rayCluster.Spec.AuthOptions != nil && rayCluster.Spec.AuthOptions.Mode == rayv1.AuthModeToken {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rayCluster != nil is probably not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for api server's e2e test


tokenBytes, exists := secret.Data[RAY_AUTH_TOKEN_SECRET_KEY]
if !exists {
return nil, fmt.Errorf("auth token key '%s' not found in secret %s/%s", RAY_AUTH_TOKEN_SECRET_KEY, rayCluster.Namespace, secretName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use %q instead of '%s'

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
// TODO: support a fallback auth header in Ray side, something like X-Ray-Auth: Bearer <token>
func (r *RayDashboardClient) setAuthHeader(req *http.Request) {
if r.authToken != "" {
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", r.authToken))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ray the header is lower-cased authorization https://github.com/ray-project/ray/blob/master/python/ray/_private/authentication/authentication_constants.py#L25

But here we're using Authorization. Do you know why it works eiher way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the library used in ray is case insensitive

note: this is an answer from @sampan-s-nayak

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, small nit to keep this as Authorization then

@Future-Outlier Future-Outlier changed the title [RayJob] Add token authentication support for K8sJob and Sidecar mode [RayJob] Add token authentication support for All mode Nov 19, 2025
Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rueian PTAL as well

@rueian rueian merged commit a739b78 into ray-project:master Nov 19, 2025
27 checks passed
@Future-Outlier
Copy link
Member Author

interactive mode

image
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-interactive-mode
spec:
  # InteractiveMode means KubeRay doesn't submit the job for you.
  # KubeRay will create the RayJob and transition it to the Waiting state.
  # The user needs to submit the job manually via the `ray job submit` command
  # and then update the `spec.jobId` field with the job ID.
  # After that, KubeRay will handle the rest of the lifecycle for the RayJob.
  submissionMode: InteractiveMode
  # User needs to update this field with the job ID after submitting the job
  jobId: ""
  rayClusterSpec:
    headGroupSpec:
      rayStartParams: {}
      template:
        spec:
          containers:
          - image: rayproject/ray:2.46.0
            name: ray-head
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            resources:
              limits:
                cpu: "2"
                memory: 4Gi
              requests:
                cpu: "2"
                memory: 4Gi
    rayVersion: 2.46.0
    workerGroupSpecs:
    - groupName: default-group
      replicas: 1
      minReplicas: 1
      maxReplicas: 5
      rayStartParams: {}
      template:
        spec:
          containers:
          - image: rayproject/ray:2.46.0
            name: ray-worker
            resources:
              limits:
                cpu: "2"
                memory: 4Gi
              requests:
                cpu: "2"
                memory: 4Gi

andrewsykim added a commit that referenced this pull request Nov 21, 2025
* [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141)

* [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted

Signed-off-by: 400Ping <[email protected]>

* [Fix] Fix e2e error

Signed-off-by: 400Ping <[email protected]>

* [Fix] fix according to rueian's comment

Signed-off-by: 400Ping <[email protected]>

* [Chore] fix ci error

Signed-off-by: 400Ping <[email protected]>

* Update ray-operator/controllers/ray/raycluster_controller.go

Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>
Signed-off-by: Ping <[email protected]>

* Update ray-operator/controllers/ray/rayjob_controller.go

Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>
Signed-off-by: Ping <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* Trigger CI

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: 400Ping <[email protected]>
Signed-off-by: Ping <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>

* fix: dashboard build for kuberay 1.5.0 (#4161)

Signed-off-by: Future-Outlier <[email protected]>

* [Feature Enhancement] Set ordered replica index label to support multi-slice (#4163)

* [Feature Enhancement] Set ordered replica index label to support multi-slice

Signed-off-by: Ryan O'Leary <[email protected]>

* rename replica-id -> replica-name

Signed-off-by: Ryan O'Leary <[email protected]>

* Separate replica index feature gate logic

Signed-off-by: Ryan O'Leary <[email protected]>

* remove index arg in createWorkerPod

Signed-off-by: Ryan O'Leary <[email protected]>

---------

Signed-off-by: Ryan O'Leary <[email protected]>

* update stale feature gate comments (#4174)

Signed-off-by: Andrew Sy Kim <[email protected]>

* [RayCluster] Add more context why we don't recreate head Pod for RayJob (#4175)

Signed-off-by: Kai-Hsun Chen <[email protected]>

* feature: Remove empty resource list initialization. (#4168)

Fixes #4142.

* [Dockerfile] [KubeRay Dashboard]: Fix Dockerfile warnings (ENV format, CMD JSON args) (#4167)

* [#4166] improvement: Fix Dockerfile warnings (ENV format, CMD JSON args)

* extract the hostname from CMD

Signed-off-by: Neo Chien <[email protected]>

---------

Signed-off-by: Neo Chien <[email protected]>
Co-authored-by: cchung100m <[email protected]>

* [Fix] Resolve int32 overflow by having the calculation in int64 and c… (#4158)

* [Fix] Resolve int32 overflow by having the calculation in int64 and cap it if the count is over math.MaxInt32

Signed-off-by: justinyeh1995 <[email protected]>

* [Test] Add unit tests for CalculateReadyReplicas

Signed-off-by: justinyeh1995 <[email protected]>

* [Fix] Add a nosec comment to pass the Lint (pre-commit) test

Signed-off-by: justinyeh1995 <[email protected]>

* [Refactor] Add CapInt64ToInt32 to replace #nosec directives

Signed-off-by: justinyeh1995 <[email protected]>

* [Refactor] Rename function to SafeInt64ToInt32 and add a underflowing prevention (it also help pass the lint test)

Signed-off-by: justinyeh1995 <[email protected]>

* [Refactor] Remove the early return as SafeInt64ToInt32 handles the int32 overflow and underflow checking.

Signed-off-by: justinyeh1995 <[email protected]>

---------

Signed-off-by: justinyeh1995 <[email protected]>

* Add RayService incremental upgrade sample for guide (#4164)

Signed-off-by: Ryan O'Leary <[email protected]>

* Edit RayCluster example config for label selectors (#4151)

Signed-off-by: Ryan O'Leary <[email protected]>

* [RayJob] update light weight submitter image from quay.io (#4181)

Signed-off-by: Future-Outlier <[email protected]>

* [flaky] RayJob fails when head Pod is deleted when job is running (#4182)

Signed-off-by: Future-Outlier <[email protected]>

* [CI] Pin Docker api version to avoid API version mismatch (#4188)

Signed-off-by: win5923 <[email protected]>

* Make replicas configurable for kuberay-operator #4180 (#4195)

* Make replicas configurable for kuberay-operator #4180

* Make replicas configurable for kuberay-operator #4180

* [Fix] rayjob update raycluster status (#4192)

* feat: check if raycluster status update in rayjob

* test: e2e test to check the rayjob raycluster status update

* fix: dashboard http client tests discovered and passing (#4173)

Signed-off-by: alimaazamat <[email protected]>

* [RayJob] Lift cluster status while initializing (#4191)

Signed-off-by: Spencer Peterson <[email protected]>

* [RayJob] Remove updateJobStatus call (#4198)

Fast follow to #4191

Signed-off-by: Spencer Peterson <[email protected]>

* Add support for Ray token auth (#4179)

* Add support for Ray token auth

Signed-off-by: Andrew Sy Kim <[email protected]>

* add e2e test for Ray cluster auth

Signed-off-by: Andrew Sy Kim <[email protected]>

* address nits from Ruiean

Signed-off-by: Andrew Sy Kim <[email protected]>

* update RAY_auth_mode -> RAY_AUTH_MODE

Signed-off-by: Andrew Sy Kim <[email protected]>

* configure auth for Ray autoscaler

Signed-off-by: Andrew Sy Kim <[email protected]>

---------

Signed-off-by: Andrew Sy Kim <[email protected]>

* Bump js-yaml from 4.1.0 to 4.1.1 in /dashboard (#4194)

Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](nodeca/js-yaml@4.1.0...4.1.1)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update minimum Ray version required for token authentication to 2.52.0 (#4201)

* update minimum Ray version required for token authentication to 2.52.0

Signed-off-by: Andrew Sy Kim <[email protected]>

* update RayCluster auth e2e test to use Ray v2.52

Signed-off-by: Andrew Sy Kim <[email protected]>

---------

Signed-off-by: Andrew Sy Kim <[email protected]>

* add samples for RayCluster token auth (#4200)

Signed-off-by: Andrew Sy Kim <[email protected]>

* update (#4208)

Signed-off-by: Future-Outlier <[email protected]>

* [RayJob] Add token authentication support for All mode (#4210)

* dashboard client authentication support

Signed-off-by: Future-Outlier <[email protected]>

* support rayjob

Signed-off-by: Future-Outlier <[email protected]>

* update to fix api serverr err

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* updarte

Signed-off-by: Future-Outlier <[email protected]>

* Rayjob sidecar mode auth token mode support

Signed-off-by: Future-Outlier <[email protected]>

* RayJob support k8s job mode

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* Address Andrew's advice

Signed-off-by: Future-Outlier <[email protected]>

* add todo x-ray-authorization comments

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>

* [RayCluster] Enable Secret informer watch/list and remove unused RBAC verbs (#4202)

* Add authentication secret reconciliation support

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* fix flaky test

Signed-off-by: Future-Outlier <[email protected]>

* remove test fix

Signed-off-by: Rueian <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Rueian <[email protected]>
Co-authored-by: Rueian <[email protected]>

* [APIServer][Docs] Add user guide for retry behavior & configuration (#4144)

* [Docs] Add the draft description about feature intro, configurations, and usecases

Signed-off-by: justinyeh1995 <[email protected]>

* [Fix] Update the retry walk-through

Signed-off-by: justinyeh1995 <[email protected]>

* [Doc] rewrite the first 2 sections

Signed-off-by: justinyeh1995 <[email protected]>

* [Doc] Revise documentation wording and add Observing Retry Behavior section

Signed-off-by: justinyeh1995 <[email protected]>

* [Fix] fix linting issue by running pre-commit run berfore commiting

Signed-off-by: justinyeh1995 <[email protected]>

* [Fix] fix linting errors in the Markdown linting

Signed-off-by: justinyeh1995 <[email protected]>

* [Fix] Clean up the math equation

Signed-off-by: justinyeh1995 <[email protected]>

* Update the math formula of Backoff calculation.

Co-authored-by: Nary Yeh <[email protected]>
Signed-off-by: JustinYeh <[email protected]>

* [Fix] Explicitly mentioned exponential backoff and removed the customization parts

Signed-off-by: justinyeh1995 <[email protected]>

* [Docs] Clarify naming by replacing “APIServer” with “KubeRay APIServer”

Co-authored-by: Cheng-Yeh Chung <[email protected]>
Signed-off-by: JustinYeh <[email protected]>

* [Docs] Rename retry-configuration.md to retry-behavior.md for accuracy

Signed-off-by: justinyeh1995 <[email protected]>

* Update Title to KubeRay APIServer Retry Behavior

Co-authored-by: Cheng-Yeh Chung <[email protected]>
Signed-off-by: JustinYeh <[email protected]>

* [Docs] Add a note about the limitation of retry configuration

Signed-off-by: justinyeh1995 <[email protected]>

---------

Signed-off-by: justinyeh1995 <[email protected]>
Signed-off-by: JustinYeh <[email protected]>
Co-authored-by: Nary Yeh <[email protected]>
Co-authored-by: Cheng-Yeh Chung <[email protected]>

* Support X-Ray-Authorization fallback header for accepting auth token via proxy (#4213)

* Support X-Ray-Authorization fallback header for accepting auth token in dashboard

Signed-off-by: Future-Outlier <[email protected]>

* remove todo comment

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>

* [RayCluster] make auth token secret name consistency (#4216)

Signed-off-by: fscnick <[email protected]>

* [RayCluster] Status includes head containter status message (#4196)

* [RayCluster] Status includes head containter status message

Signed-off-by: Spencer Peterson <[email protected]>

* lint

Signed-off-by: Spencer Peterson <[email protected]>

* [RayCluster] Containers not ready status reflects structured reason

Signed-off-by: Spencer Peterson <[email protected]>

* nit

Signed-off-by: Spencer Peterson <[email protected]>

---------

Signed-off-by: Spencer Peterson <[email protected]>

* Remove erroneous  call in applyServeTargetCapacity (#4212)

Signed-off-by: Ryan O'Leary <[email protected]>

* [RayJob] Add token authentication support for light weight job submitter (#4215)

* [RayJob] light weight job submitter auth token support

Signed-off-by: Future-Outlier <[email protected]>

* X-Ray-Authorization

Signed-off-by: Rueian <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Rueian <[email protected]>
Co-authored-by: Rueian <[email protected]>

* feat: kubectl ray get token command (#4218)

* feat: kubectl ray get token command

Signed-off-by: Rueian <[email protected]>

* Update kubectl-plugin/pkg/cmd/get/get_token_test.go

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Rueian <[email protected]>

* Update kubectl-plugin/pkg/cmd/get/get_token.go

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Rueian <[email protected]>

* make sure the raycluster exists before getting the secret

Signed-off-by: Rueian <[email protected]>

* better ux

Signed-off-by: Rueian <[email protected]>

* Update kubectl-plugin/pkg/cmd/get/get_token.go

Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>
Signed-off-by: Rueian <[email protected]>

---------

Signed-off-by: Rueian <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>

---------

Signed-off-by: 400Ping <[email protected]>
Signed-off-by: Ping <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Neo Chien <[email protected]>
Signed-off-by: justinyeh1995 <[email protected]>
Signed-off-by: win5923 <[email protected]>
Signed-off-by: alimaazamat <[email protected]>
Signed-off-by: Spencer Peterson <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Rueian <[email protected]>
Signed-off-by: JustinYeh <[email protected]>
Signed-off-by: fscnick <[email protected]>
Co-authored-by: Ping <[email protected]>
Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]>
Co-authored-by: Ryan O'Leary <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kavish <[email protected]>
Co-authored-by: Neo Chien <[email protected]>
Co-authored-by: cchung100m <[email protected]>
Co-authored-by: JustinYeh <[email protected]>
Co-authored-by: Jun-Hao Wan <[email protected]>
Co-authored-by: Divyam Raj <[email protected]>
Co-authored-by: Nary Yeh <[email protected]>
Co-authored-by: Alima Azamat <[email protected]>
Co-authored-by: Spencer Peterson <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Rueian <[email protected]>
Co-authored-by: Cheng-Yeh Chung <[email protected]>
Co-authored-by: fscnick <[email protected]>
Co-authored-by: Copilot <[email protected]>
win5923 pushed a commit to win5923/kuberay that referenced this pull request Dec 13, 2025
)

* dashboard client authentication support

Signed-off-by: Future-Outlier <[email protected]>

* support rayjob

Signed-off-by: Future-Outlier <[email protected]>

* update to fix api serverr err

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* updarte

Signed-off-by: Future-Outlier <[email protected]>

* Rayjob sidecar mode auth token mode support

Signed-off-by: Future-Outlier <[email protected]>

* RayJob support k8s job mode

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* Address Andrew's advice

Signed-off-by: Future-Outlier <[email protected]>

* add todo x-ray-authorization comments

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
win5923 pushed a commit to win5923/kuberay that referenced this pull request Dec 17, 2025
)

* dashboard client authentication support

Signed-off-by: Future-Outlier <[email protected]>

* support rayjob

Signed-off-by: Future-Outlier <[email protected]>

* update to fix api serverr err

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* updarte

Signed-off-by: Future-Outlier <[email protected]>

* Rayjob sidecar mode auth token mode support

Signed-off-by: Future-Outlier <[email protected]>

* RayJob support k8s job mode

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* Address Andrew's advice

Signed-off-by: Future-Outlier <[email protected]>

* add todo x-ray-authorization comments

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants