-
Notifications
You must be signed in to change notification settings - Fork 723
[Test] Add load tests and behavioral checks to incremental upgrade E2E #4541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JiangJiaWei1103
wants to merge
51
commits into
ray-project:master
Choose a base branch
from
JiangJiaWei1103:add-load-tests-incr-upgrade-e2e
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 28 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
44c4aef
docs: Add step-by-step process of the basic e2e
JiangJiaWei1103 01bd86e
chore: Align test infra setup order with Ray docs
JiangJiaWei1103 40f1d5c
test: Add locust load test for incr upgrade e2e
JiangJiaWei1103 bb4a26f
docs: Improve maintainability of locust yaml
JiangJiaWei1103 afc9dcc
refactor: Remove redundant helper
JiangJiaWei1103 89f5478
fix: Deflake hardcoded sleep for Locust ramp up
JiangJiaWei1103 004a757
refactor: Remove legacy and extract a helper to get rps index
JiangJiaWei1103 f00ab71
test: Ensure remaining traffic routed to the new cluster
JiangJiaWei1103 ef802a6
test: Support high-rps serve application with expected rps over 900
JiangJiaWei1103 87ca63d
Test CI
JiangJiaWei1103 1b8ab65
Test CI
JiangJiaWei1103 504ecf5
Test CI
JiangJiaWei1103 e64a6e7
test: Recover basic incr upgrade test
JiangJiaWei1103 25f34ee
docs: Improve maintainability
JiangJiaWei1103 6b23507
fix: Deflake CI istio gc installation
JiangJiaWei1103 0aa45a1
fix: Deflake istio gc installation
JiangJiaWei1103 dc98432
revert: Use orig install order
JiangJiaWei1103 7e4d888
test: Support diverse incr upgrade parameter combinations
JiangJiaWei1103 8f6df3d
Test CI
JiangJiaWei1103 f71836a
fix: Skip transient state check right before promotion
JiangJiaWei1103 b1eeab8
test: Retest standard gradual incr upgrade
JiangJiaWei1103 7c2e504
refactor: Make curl function clearer
JiangJiaWei1103 62fa293
fix: Avoid t.FailNow from non-test goroutines
JiangJiaWei1103 d152779
fix: Deflake by using commit hash
JiangJiaWei1103 b9db8ee
refactor: Remove redundant checks
JiangJiaWei1103 442db2d
refactor: Get rps col index without hardcoded int
JiangJiaWei1103 2e7042e
refactor: Extract locust warmup constants for tweaking
JiangJiaWei1103 6ae50b8
Remove redundant line
JiangJiaWei1103 9737c31
test: Split test responsibilities
JiangJiaWei1103 c57f594
refactor: Use eg instead of wg
JiangJiaWei1103 f183c38
refactor: Extract trigger incr upgrade helper and curl const
JiangJiaWei1103 bf164e9
fix: Deflake waiting for upgrade complete using a longer timeout
JiangJiaWei1103 9becace
Improve readability
JiangJiaWei1103 b485e1e
fix: Remove data race on err btw goroutines
JiangJiaWei1103 38bce3f
fix: Fix last migrate time check
JiangJiaWei1103 6acc27a
Merge branch 'master' into add-load-tests-incr-upgrade-e2e
ryanaoleary fe092d0
Fix merge and update rollback test for changes in this PR
ryanaoleary 067ba1f
chore: Use a larger runner for higher RPS
JiangJiaWei1103 d0025dc
Revert "chore: Use a larger runner for higher RPS"
JiangJiaWei1103 5eef493
Revert "[RayService] Rollback Support for Incremental Upgrades (#4109)"
Future-Outlier 675d3ef
codex issue
Future-Outlier 4c38b7e
chore: Deflake by increasing timeout
JiangJiaWei1103 7e821f3
chore: Remove worker rsc limits to align with the orig example
JiangJiaWei1103 13011d6
Revert "chore: Remove worker rsc limits to align with the orig example"
JiangJiaWei1103 73e6637
docs: Better strategy naming
JiangJiaWei1103 baf2d6c
Revert: Remove rollback e2e
JiangJiaWei1103 44fdb7e
Revert "Revert: Remove rollback e2e"
JiangJiaWei1103 d1591b9
chore: Use Ray project serve config example link
JiangJiaWei1103 ffd38b6
chore: Deflake CI by lowering steady state RPS
JiangJiaWei1103 9f745c3
fix: Make log msg accurate
JiangJiaWei1103 fdf988f
fix: Make log msg more accurate
JiangJiaWei1103 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| package e2eincrementalupgrade | ||
|
|
||
| import "k8s.io/utils/ptr" | ||
|
|
||
| type serveConfigV2 string | ||
|
|
||
| // These parameters control capacity scaling and gradual traffic migration during the upgrade. | ||
| type incrementalUpgradeParams struct { | ||
| Name string | ||
| StepSize int32 | ||
| Interval int32 | ||
| MaxSurge int32 | ||
| } | ||
|
|
||
| // incrementalUpgradeCombinations defines diverse (stepSize, interval, maxSurge) combinations | ||
| // to exercise different upgrade behaviors. Each combination targets a distinct scenario. | ||
| var incrementalUpgradeCombinations = []incrementalUpgradeParams{ | ||
| { | ||
| // Scenario: Instant cutover. | ||
| // All capacity and traffic shift in one step, which behaves like a blue/green deployment. | ||
| StepSize: 100, | ||
| Interval: 1, | ||
| MaxSurge: 100, | ||
| Name: "BlueGreen", | ||
| }, | ||
| { | ||
| // Scenario: Standard gradual upgrade. | ||
| // Scaling and migration in multiple steps. | ||
| StepSize: 25, | ||
| Interval: 5, | ||
| MaxSurge: 50, | ||
| Name: "StandardGradual", | ||
| }, | ||
| { | ||
| // Scenario: Conservative gradual upgrade. | ||
| // Low-step, long-interval scaling and migration in multiple steps. | ||
| StepSize: 5, | ||
| Interval: 10, | ||
| MaxSurge: 25, | ||
| Name: "ConservativeGradual", | ||
| }, | ||
| } | ||
|
|
||
| // ptrs returns (*stepSize, *interval, *maxSurge) for use with the RayService bootstrap helper. | ||
| func (p incrementalUpgradeParams) ptrs() (*int32, *int32, *int32) { | ||
| return ptr.To(p.StepSize), ptr.To(p.Interval), ptr.To(p.MaxSurge) | ||
| } | ||
|
|
||
| // The following defines the Serve configurations for different types of incremental upgrade tests, including: | ||
| // - Functional test | ||
| // - High-RPS Locust load test | ||
| // | ||
| // NOTE: working_dir is coupled with the external GitHub repos, which might lead to CI flakiness considering the | ||
| // availability and stability of these repos and specific commit hashes. | ||
|
|
||
| // defaultIncrementalUpgradeServeConfigV2 configures a Serve app for functional tests. | ||
| const defaultIncrementalUpgradeServeConfigV2 serveConfigV2 = `applications: | ||
| - name: fruit_app | ||
| import_path: fruit.deployment_graph | ||
| route_prefix: /fruit | ||
| runtime_env: | ||
| working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip" | ||
| deployments: | ||
| - name: MangoStand | ||
| num_replicas: 1 | ||
| user_config: | ||
| price: 3 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| - name: OrangeStand | ||
| num_replicas: 1 | ||
| user_config: | ||
| price: 2 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| - name: FruitMarket | ||
| num_replicas: 1 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| - name: math_app | ||
| import_path: conditional_dag.serve_dag | ||
| route_prefix: /calc | ||
| runtime_env: | ||
| working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip" | ||
| deployments: | ||
| - name: Adder | ||
| num_replicas: 1 | ||
| user_config: | ||
| increment: 3 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| - name: Multiplier | ||
| num_replicas: 1 | ||
| user_config: | ||
| factor: 5 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| - name: Router | ||
| num_replicas: 1 | ||
| ray_actor_options: | ||
| num_cpus: 0.1 | ||
| ` | ||
|
|
||
| // highRPSServeConfigV2 configures a minimal high-RPS Serve app (SimpleDeployment) for Locust load tests. | ||
| const highRPSServeConfigV2 serveConfigV2 = `applications: | ||
| - name: simple_app | ||
| import_path: simple_serve.app | ||
| route_prefix: /test | ||
| runtime_env: | ||
| working_dir: "https://github.com/jiangjiawei1103/incr-upgrade-locust/archive/a185bb29374388e801db4331ae73af3ad1e79a5f.zip" | ||
| deployments: | ||
| - name: SimpleDeployment | ||
| autoscaling_config: | ||
| min_replicas: 1 | ||
| max_replicas: 3 | ||
| target_ongoing_requests: 2 | ||
| max_ongoing_requests: 6 | ||
| upscale_delay_s: 0.5 | ||
| ray_actor_options: | ||
| num_cpus: 2 | ||
| ` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the locust workload merged to one of Ray's repos like: https://github.com/ray-project/serve_config_examples.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I also mentioned in the Limitations section that we shouldn’t depend on a personal repo, and that the code should be migrated to an official Ray repo.
We can discuss with @Future-Outlier to determine the most appropriate place to host this simple Serve app source. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a PR here: ray-project/serve_config_examples#15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ryan!! I'll change the URL once the PR is merged.