[RayJob] Add token authentication support for All mode#4210
[RayJob] Add token authentication support for All mode#4210rueian merged 14 commits intoray-project:masterfrom
Conversation
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
andrewsykim
left a comment
There was a problem hiding this comment.
Is there reason we're not supporting HttpMode?
|
|
||
| func (r *RayDashboardClient) setAuthHeader(req *http.Request) { | ||
| if r.authToken != "" { | ||
| req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", r.authToken)) |
There was a problem hiding this comment.
can you add a TODO to use X-Ray-Auth based on our discussion with Sampan today? Or did you get it working with the standard header?
There was a problem hiding this comment.
sorry I missed the TODO in the other file, probably doesn't hurt to add it here too
| func GetRayDashboardClientFunc(mgr manager.Manager, useKubernetesProxy bool) func(rayCluster *rayv1.RayCluster, url string) (dashboardclient.RayDashboardClientInterface, error) { | ||
| return func(rayCluster *rayv1.RayCluster, url string) (dashboardclient.RayDashboardClientInterface, error) { | ||
| dashboardClient := &dashboardclient.RayDashboardClient{} | ||
| authToken := "" |
| dashboardClient.InitClient(&http.Client{ | ||
| Timeout: 2 * time.Second, | ||
| }, "http://"+url) | ||
| if rayCluster != nil && rayCluster.Spec.AuthOptions != nil && rayCluster.Spec.AuthOptions.Mode == rayv1.AuthModeToken { |
There was a problem hiding this comment.
the rayCluster != nil is probably not needed
There was a problem hiding this comment.
this is for api server's e2e test
|
|
||
| tokenBytes, exists := secret.Data[RAY_AUTH_TOKEN_SECRET_KEY] | ||
| if !exists { | ||
| return nil, fmt.Errorf("auth token key '%s' not found in secret %s/%s", RAY_AUTH_TOKEN_SECRET_KEY, rayCluster.Namespace, secretName) |
There was a problem hiding this comment.
nit: you can use %q instead of '%s'
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
| // TODO: support a fallback auth header in Ray side, something like X-Ray-Auth: Bearer <token> | ||
| func (r *RayDashboardClient) setAuthHeader(req *http.Request) { | ||
| if r.authToken != "" { | ||
| req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", r.authToken)) |
There was a problem hiding this comment.
In Ray the header is lower-cased authorization https://github.com/ray-project/ray/blob/master/python/ray/_private/authentication/authentication_constants.py#L25
But here we're using Authorization. Do you know why it works eiher way?
There was a problem hiding this comment.
since the library used in ray is case insensitive
note: this is an answer from @sampan-s-nayak
There was a problem hiding this comment.
Thanks, small nit to keep this as Authorization then
Signed-off-by: Future-Outlier <[email protected]>
* [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141) * [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted Signed-off-by: 400Ping <[email protected]> * [Fix] Fix e2e error Signed-off-by: 400Ping <[email protected]> * [Fix] fix according to rueian's comment Signed-off-by: 400Ping <[email protected]> * [Chore] fix ci error Signed-off-by: 400Ping <[email protected]> * Update ray-operator/controllers/ray/raycluster_controller.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Ping <[email protected]> * Update ray-operator/controllers/ray/rayjob_controller.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Ping <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: 400Ping <[email protected]> Signed-off-by: Ping <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> * fix: dashboard build for kuberay 1.5.0 (#4161) Signed-off-by: Future-Outlier <[email protected]> * [Feature Enhancement] Set ordered replica index label to support multi-slice (#4163) * [Feature Enhancement] Set ordered replica index label to support multi-slice Signed-off-by: Ryan O'Leary <[email protected]> * rename replica-id -> replica-name Signed-off-by: Ryan O'Leary <[email protected]> * Separate replica index feature gate logic Signed-off-by: Ryan O'Leary <[email protected]> * remove index arg in createWorkerPod Signed-off-by: Ryan O'Leary <[email protected]> --------- Signed-off-by: Ryan O'Leary <[email protected]> * update stale feature gate comments (#4174) Signed-off-by: Andrew Sy Kim <[email protected]> * [RayCluster] Add more context why we don't recreate head Pod for RayJob (#4175) Signed-off-by: Kai-Hsun Chen <[email protected]> * feature: Remove empty resource list initialization. (#4168) Fixes #4142. * [Dockerfile] [KubeRay Dashboard]: Fix Dockerfile warnings (ENV format, CMD JSON args) (#4167) * [#4166] improvement: Fix Dockerfile warnings (ENV format, CMD JSON args) * extract the hostname from CMD Signed-off-by: Neo Chien <[email protected]> --------- Signed-off-by: Neo Chien <[email protected]> Co-authored-by: cchung100m <[email protected]> * [Fix] Resolve int32 overflow by having the calculation in int64 and c… (#4158) * [Fix] Resolve int32 overflow by having the calculation in int64 and cap it if the count is over math.MaxInt32 Signed-off-by: justinyeh1995 <[email protected]> * [Test] Add unit tests for CalculateReadyReplicas Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Add a nosec comment to pass the Lint (pre-commit) test Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Add CapInt64ToInt32 to replace #nosec directives Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Rename function to SafeInt64ToInt32 and add a underflowing prevention (it also help pass the lint test) Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Remove the early return as SafeInt64ToInt32 handles the int32 overflow and underflow checking. Signed-off-by: justinyeh1995 <[email protected]> --------- Signed-off-by: justinyeh1995 <[email protected]> * Add RayService incremental upgrade sample for guide (#4164) Signed-off-by: Ryan O'Leary <[email protected]> * Edit RayCluster example config for label selectors (#4151) Signed-off-by: Ryan O'Leary <[email protected]> * [RayJob] update light weight submitter image from quay.io (#4181) Signed-off-by: Future-Outlier <[email protected]> * [flaky] RayJob fails when head Pod is deleted when job is running (#4182) Signed-off-by: Future-Outlier <[email protected]> * [CI] Pin Docker api version to avoid API version mismatch (#4188) Signed-off-by: win5923 <[email protected]> * Make replicas configurable for kuberay-operator #4180 (#4195) * Make replicas configurable for kuberay-operator #4180 * Make replicas configurable for kuberay-operator #4180 * [Fix] rayjob update raycluster status (#4192) * feat: check if raycluster status update in rayjob * test: e2e test to check the rayjob raycluster status update * fix: dashboard http client tests discovered and passing (#4173) Signed-off-by: alimaazamat <[email protected]> * [RayJob] Lift cluster status while initializing (#4191) Signed-off-by: Spencer Peterson <[email protected]> * [RayJob] Remove updateJobStatus call (#4198) Fast follow to #4191 Signed-off-by: Spencer Peterson <[email protected]> * Add support for Ray token auth (#4179) * Add support for Ray token auth Signed-off-by: Andrew Sy Kim <[email protected]> * add e2e test for Ray cluster auth Signed-off-by: Andrew Sy Kim <[email protected]> * address nits from Ruiean Signed-off-by: Andrew Sy Kim <[email protected]> * update RAY_auth_mode -> RAY_AUTH_MODE Signed-off-by: Andrew Sy Kim <[email protected]> * configure auth for Ray autoscaler Signed-off-by: Andrew Sy Kim <[email protected]> --------- Signed-off-by: Andrew Sy Kim <[email protected]> * Bump js-yaml from 4.1.0 to 4.1.1 in /dashboard (#4194) Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1. - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](nodeca/js-yaml@4.1.0...4.1.1) --- updated-dependencies: - dependency-name: js-yaml dependency-version: 4.1.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update minimum Ray version required for token authentication to 2.52.0 (#4201) * update minimum Ray version required for token authentication to 2.52.0 Signed-off-by: Andrew Sy Kim <[email protected]> * update RayCluster auth e2e test to use Ray v2.52 Signed-off-by: Andrew Sy Kim <[email protected]> --------- Signed-off-by: Andrew Sy Kim <[email protected]> * add samples for RayCluster token auth (#4200) Signed-off-by: Andrew Sy Kim <[email protected]> * update (#4208) Signed-off-by: Future-Outlier <[email protected]> * [RayJob] Add token authentication support for All mode (#4210) * dashboard client authentication support Signed-off-by: Future-Outlier <[email protected]> * support rayjob Signed-off-by: Future-Outlier <[email protected]> * update to fix api serverr err Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * updarte Signed-off-by: Future-Outlier <[email protected]> * Rayjob sidecar mode auth token mode support Signed-off-by: Future-Outlier <[email protected]> * RayJob support k8s job mode Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Address Andrew's advice Signed-off-by: Future-Outlier <[email protected]> * add todo x-ray-authorization comments Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> * [RayCluster] Enable Secret informer watch/list and remove unused RBAC verbs (#4202) * Add authentication secret reconciliation support Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * fix flaky test Signed-off-by: Future-Outlier <[email protected]> * remove test fix Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Rueian <[email protected]> Co-authored-by: Rueian <[email protected]> * [APIServer][Docs] Add user guide for retry behavior & configuration (#4144) * [Docs] Add the draft description about feature intro, configurations, and usecases Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Update the retry walk-through Signed-off-by: justinyeh1995 <[email protected]> * [Doc] rewrite the first 2 sections Signed-off-by: justinyeh1995 <[email protected]> * [Doc] Revise documentation wording and add Observing Retry Behavior section Signed-off-by: justinyeh1995 <[email protected]> * [Fix] fix linting issue by running pre-commit run berfore commiting Signed-off-by: justinyeh1995 <[email protected]> * [Fix] fix linting errors in the Markdown linting Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Clean up the math equation Signed-off-by: justinyeh1995 <[email protected]> * Update the math formula of Backoff calculation. Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Fix] Explicitly mentioned exponential backoff and removed the customization parts Signed-off-by: justinyeh1995 <[email protected]> * [Docs] Clarify naming by replacing “APIServer” with “KubeRay APIServer” Co-authored-by: Cheng-Yeh Chung <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Docs] Rename retry-configuration.md to retry-behavior.md for accuracy Signed-off-by: justinyeh1995 <[email protected]> * Update Title to KubeRay APIServer Retry Behavior Co-authored-by: Cheng-Yeh Chung <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Docs] Add a note about the limitation of retry configuration Signed-off-by: justinyeh1995 <[email protected]> --------- Signed-off-by: justinyeh1995 <[email protected]> Signed-off-by: JustinYeh <[email protected]> Co-authored-by: Nary Yeh <[email protected]> Co-authored-by: Cheng-Yeh Chung <[email protected]> * Support X-Ray-Authorization fallback header for accepting auth token via proxy (#4213) * Support X-Ray-Authorization fallback header for accepting auth token in dashboard Signed-off-by: Future-Outlier <[email protected]> * remove todo comment Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> * [RayCluster] make auth token secret name consistency (#4216) Signed-off-by: fscnick <[email protected]> * [RayCluster] Status includes head containter status message (#4196) * [RayCluster] Status includes head containter status message Signed-off-by: Spencer Peterson <[email protected]> * lint Signed-off-by: Spencer Peterson <[email protected]> * [RayCluster] Containers not ready status reflects structured reason Signed-off-by: Spencer Peterson <[email protected]> * nit Signed-off-by: Spencer Peterson <[email protected]> --------- Signed-off-by: Spencer Peterson <[email protected]> * Remove erroneous call in applyServeTargetCapacity (#4212) Signed-off-by: Ryan O'Leary <[email protected]> * [RayJob] Add token authentication support for light weight job submitter (#4215) * [RayJob] light weight job submitter auth token support Signed-off-by: Future-Outlier <[email protected]> * X-Ray-Authorization Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Rueian <[email protected]> Co-authored-by: Rueian <[email protected]> * feat: kubectl ray get token command (#4218) * feat: kubectl ray get token command Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token_test.go Co-authored-by: Copilot <[email protected]> Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token.go Co-authored-by: Copilot <[email protected]> Signed-off-by: Rueian <[email protected]> * make sure the raycluster exists before getting the secret Signed-off-by: Rueian <[email protected]> * better ux Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Rueian <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> --------- Signed-off-by: 400Ping <[email protected]> Signed-off-by: Ping <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Neo Chien <[email protected]> Signed-off-by: justinyeh1995 <[email protected]> Signed-off-by: win5923 <[email protected]> Signed-off-by: alimaazamat <[email protected]> Signed-off-by: Spencer Peterson <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Rueian <[email protected]> Signed-off-by: JustinYeh <[email protected]> Signed-off-by: fscnick <[email protected]> Co-authored-by: Ping <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kavish <[email protected]> Co-authored-by: Neo Chien <[email protected]> Co-authored-by: cchung100m <[email protected]> Co-authored-by: JustinYeh <[email protected]> Co-authored-by: Jun-Hao Wan <[email protected]> Co-authored-by: Divyam Raj <[email protected]> Co-authored-by: Nary Yeh <[email protected]> Co-authored-by: Alima Azamat <[email protected]> Co-authored-by: Spencer Peterson <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Rueian <[email protected]> Co-authored-by: Cheng-Yeh Chung <[email protected]> Co-authored-by: fscnick <[email protected]> Co-authored-by: Copilot <[email protected]>
) * dashboard client authentication support Signed-off-by: Future-Outlier <[email protected]> * support rayjob Signed-off-by: Future-Outlier <[email protected]> * update to fix api serverr err Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * updarte Signed-off-by: Future-Outlier <[email protected]> * Rayjob sidecar mode auth token mode support Signed-off-by: Future-Outlier <[email protected]> * RayJob support k8s job mode Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Address Andrew's advice Signed-off-by: Future-Outlier <[email protected]> * add todo x-ray-authorization comments Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]>
) * dashboard client authentication support Signed-off-by: Future-Outlier <[email protected]> * support rayjob Signed-off-by: Future-Outlier <[email protected]> * update to fix api serverr err Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * updarte Signed-off-by: Future-Outlier <[email protected]> * Rayjob sidecar mode auth token mode support Signed-off-by: Future-Outlier <[email protected]> * RayJob support k8s job mode Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Address Andrew's advice Signed-off-by: Future-Outlier <[email protected]> * add todo x-ray-authorization comments Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]>

Why are these changes needed?
This PR is a superset of #4204
follow-up:
How I test it?
cd kuberay/ray-operator IMG=kuberay/operator:nightly-5 make docker-build kind load docker-image kuberay/operator:nightly-5 helm install kuberay-operator --set image.repository=kuberay/operator --set image.tag=nightly-5 ../helm-chart/kuberay-operatorI supported 3 examples in this PR.
a. with cluster selector
b. without cluster selector
Sidecar Mode
k8s job mode (no cluster selector)
job pod's env
k8s job mode (no cluster selector)
Related issue number
#4203
Checks