[Feat] Support multi-process benchmark #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Support for Multi-Process Benchmark Operation
Benchmarks typically run as a single process and thread. During benchmark execution, each request's output is received and the time (ITL) is calculated. However, when multiple requests are running simultaneously, a bottleneck occurs in output processing, preventing proper ITL calculations.
We added the
--max-connections-per-workeroption to the benchmark. Based on this value, multiple processes are created to run the benchmark.Each process processes requests equal to num_requests / process_cnt, and each process also calculates the ITL. The metrics processed by each process are then aggregated to produce the final benchmark result (Serving Benchmark Result).
Test Plan
Test Result
Number of worker processes$ topcommand to check multiprocess benchmarksEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.