feat: loadgen SIGINT handler#244
Conversation
|
Welcome @changminbark! |
jjk-g
left a comment
There was a problem hiding this comment.
Thanks for adding!
Can you add an example running and it handling sigint?
This also only handles sigint for a given stage, if user cancels during stage 2/5, would they need to sigint the remaining stages
|
/lgtm |
|
/assign @terrytangyuan |
|
Can you fix the type check issue - https://github.com/kubernetes-sigs/inference-perf/actions/runs/18292806647/job/52485168010?pr=244? |
I have just fixed it. Can I also ask how I can run linting workflows on my local machine? Thank you! |
Yes, |
|
Looks like lint check passed, but typecheck failed. Please address that as well. |
I just fixed it and learned about pdm (python development manager). It's a new tool for me but it seems very powerful. Thank you! |
b24aca5 to
86d3b21
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achandrasekar, changminbark The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
PR Template
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PRs introduces a SIGINT handler for the loadgen phase so when a SIGINT is trapped, the program moves onto the reportgen phase instead of hanging.
Which issue(s) this PR fixes:
Fixes #133
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Testing
Testing was done using the default config.yml file in the inference_perf directory and the necessary services (like vLLM serving HuggingFaceTB/SmolLM2-135M-Instruct and local prometheus).
Click to expand functional test output
Before change
After change
Reports generated after:
{ "load_summary": { "count": 13, "schedule_delay": { "mean": 0.0006833168680924035, "min": -0.0007300731404029648, "p0.1": -0.0007276184231595835, "p1": -0.0007055259679691517, "p5": -0.0006073372782338992, "p10": -0.0004284748230929835, "p25": 0.00026718769913713913, "median": 0.0006134590676083462, "p75": 0.001087595815079112, "p90": 0.0018773561738271388, "p95": 0.0020964901998013373, "p99": 0.0021771716267176087, "p99.9": 0.0021953249477737703, "max": 0.0021973419834466767 }, "send_duration": 13.933553711999593, "requested_rate": 1.0, "achieved_rate": 0.9329995971382652 }, "successes": { "count": 13, "latency": { "request_latency": { "mean": 1.7430407067691145, "min": 0.06072670499997912, "p0.1": 0.06099953636397913, "p1": 0.06345501863997924, "p5": 0.07436827319997974, "p10": 0.08805789179987188, "p25": 0.3781903320004858, "median": 1.2660093819995382, "p75": 2.4516203359999054, "p90": 3.8283391862001745, "p95": 5.109201258400022, "p99": 6.2777515292798105, "p99.9": 6.540675340227768, "max": 6.569889096999759 }, "normalized_time_per_output_token": { "mean": 0.012470926502922166, "min": 0.00917398102898216, "p0.1": 0.009175159876347384, "p1": 0.009185769502634401, "p5": 0.009232923397243362, "p10": 0.009364713885629417, "p25": 0.009774655890977644, "median": 0.009984633886017872, "p75": 0.015041869878779946, "p90": 0.017443585992874606, "p95": 0.01932054613711183, "p99": 0.02089432538733256, "p99.9": 0.02124842571863223, "max": 0.021287770199887746 }, "time_per_output_token": { "mean": 0.004691286935402641, "min": 0.003218188090861738, "p0.1": 0.0032228111164381104, "p1": 0.003264418346625462, "p5": 0.003449339369680357, "p10": 0.0036334146248348177, "p25": 0.004493048480143827, "median": 0.004630491446476313, "p75": 0.004954509693413679, "p90": 0.005680080064766961, "p95": 0.005993028690565533, "p99": 0.0063096841381211355, "p99.9": 0.006380931613821148, "max": 0.006388848000010037 }, "time_to_first_token": { "mean": 0.03603282338463032, "min": 0.017281655999795476, "p0.1": 0.017282495099800146, "p1": 0.01729004699984216, "p5": 0.017323611000028903, "p10": 0.017407786000148917, "p25": 0.018456763999893155, "median": 0.022838959000182513, "p75": 0.0634562069999447, "p90": 0.06582986580015131, "p95": 0.06801724380020459, "p99": 0.07008519036018697, "p99.9": 0.07055047833618301, "max": 0.07060217700018256 }, "inter_token_latency": { "mean": 0.004846513091485928, "min": 2.050999682978727e-06, "p0.1": 2.1702484373236075e-06, "p1": 1.4293440290202853e-05, "p5": 1.4872400242893491e-05, "p10": 1.516680003987858e-05, "p25": 1.5984000128810294e-05, "median": 6.18069998381543e-05, "p75": 0.009163918000012927, "p90": 0.010292426600426553, "p95": 0.011176135199639248, "p99": 0.013324490480117674, "p99.9": 0.030814539975600103, "max": 0.057281617000626284 } }, "throughput": { "input_tokens_per_sec": 177.8494350469882, "output_tokens_per_sec": 143.53764693502904, "total_tokens_per_sec": 321.38708198201726, "requests_per_sec": 0.8260245286212384 }, "prompt_len": { "mean": 215.30769230769232, "min": 7.0, "p0.1": 7.036, "p1": 7.36, "p5": 8.8, "p10": 11.200000000000001, "p25": 17.0, "median": 62.0, "p75": 420.0, "p90": 476.8, "p95": 610.3999999999995, "p99": 764.4799999999997, "p99.9": 799.1480000000005, "max": 803.0 }, "output_len": { "mean": 173.76923076923077, "min": 4.0, "p0.1": 4.012, "p1": 4.12, "p5": 4.6, "p10": 5.2, "p25": 21.0, "median": 138.0, "p75": 221.0, "p90": 410.0000000000001, "p95": 530.7999999999997, "p99": 632.5599999999998, "p99.9": 655.4560000000002, "max": 658.0 } }, "failures": { "count": 0, "request_latency": null, "prompt_len": null } }{ "load_summary": { "count": 13, "schedule_delay": { "mean": 0.0006833168680924035, "min": -0.0007300731404029648, "p0.1": -0.0007276184231595835, "p1": -0.0007055259679691517, "p5": -0.0006073372782338992, "p10": -0.0004284748230929835, "p25": 0.00026718769913713913, "median": 0.0006134590676083462, "p75": 0.001087595815079112, "p90": 0.0018773561738271388, "p95": 0.0020964901998013373, "p99": 0.0021771716267176087, "p99.9": 0.0021953249477737703, "max": 0.0021973419834466767 } }, "successes": { "count": 13, "latency": { "request_latency": { "mean": 1.7430407067691145, "min": 0.06072670499997912, "p0.1": 0.06099953636397913, "p1": 0.06345501863997924, "p5": 0.07436827319997974, "p10": 0.08805789179987188, "p25": 0.3781903320004858, "median": 1.2660093819995382, "p75": 2.4516203359999054, "p90": 3.8283391862001745, "p95": 5.109201258400022, "p99": 6.2777515292798105, "p99.9": 6.540675340227768, "max": 6.569889096999759 }, "normalized_time_per_output_token": { "mean": 0.012470926502922166, "min": 0.00917398102898216, "p0.1": 0.009175159876347384, "p1": 0.009185769502634401, "p5": 0.009232923397243362, "p10": 0.009364713885629417, "p25": 0.009774655890977644, "median": 0.009984633886017872, "p75": 0.015041869878779946, "p90": 0.017443585992874606, "p95": 0.01932054613711183, "p99": 0.02089432538733256, "p99.9": 0.02124842571863223, "max": 0.021287770199887746 }, "time_per_output_token": { "mean": 0.004691286935402641, "min": 0.003218188090861738, "p0.1": 0.0032228111164381104, "p1": 0.003264418346625462, "p5": 0.003449339369680357, "p10": 0.0036334146248348177, "p25": 0.004493048480143827, "median": 0.004630491446476313, "p75": 0.004954509693413679, "p90": 0.005680080064766961, "p95": 0.005993028690565533, "p99": 0.0063096841381211355, "p99.9": 0.006380931613821148, "max": 0.006388848000010037 }, "time_to_first_token": { "mean": 0.03603282338463032, "min": 0.017281655999795476, "p0.1": 0.017282495099800146, "p1": 0.01729004699984216, "p5": 0.017323611000028903, "p10": 0.017407786000148917, "p25": 0.018456763999893155, "median": 0.022838959000182513, "p75": 0.0634562069999447, "p90": 0.06582986580015131, "p95": 0.06801724380020459, "p99": 0.07008519036018697, "p99.9": 0.07055047833618301, "max": 0.07060217700018256 }, "inter_token_latency": { "mean": 0.004846513091485928, "min": 2.050999682978727e-06, "p0.1": 2.1702484373236075e-06, "p1": 1.4293440290202853e-05, "p5": 1.4872400242893491e-05, "p10": 1.516680003987858e-05, "p25": 1.5984000128810294e-05, "median": 6.18069998381543e-05, "p75": 0.009163918000012927, "p90": 0.010292426600426553, "p95": 0.011176135199639248, "p99": 0.013324490480117674, "p99.9": 0.030814539975600103, "max": 0.057281617000626284 } }, "throughput": { "input_tokens_per_sec": 177.8494350469882, "output_tokens_per_sec": 143.53764693502904, "total_tokens_per_sec": 321.38708198201726, "requests_per_sec": 0.8260245286212384 }, "prompt_len": { "mean": 215.30769230769232, "min": 7.0, "p0.1": 7.036, "p1": 7.36, "p5": 8.8, "p10": 11.200000000000001, "p25": 17.0, "median": 62.0, "p75": 420.0, "p90": 476.8, "p95": 610.3999999999995, "p99": 764.4799999999997, "p99.9": 799.1480000000005, "max": 803.0 }, "output_len": { "mean": 173.76923076923077, "min": 4.0, "p0.1": 4.012, "p1": 4.12, "p5": 4.6, "p10": 5.2, "p25": 21.0, "median": 138.0, "p75": 221.0, "p90": 410.0000000000001, "p95": 530.7999999999997, "p99": 632.5599999999998, "p99.9": 655.4560000000002, "max": 658.0 } }, "failures": { "count": 0, "request_latency": null, "prompt_len": null } }{ "load_summary": {}, "successes": { "count": 0, "rate": 0.0, "prompt_len": { "mean": 0, "rate": 0.0 }, "output_len": { "mean": 0, "rate": 0.0 }, "queue_len": { "mean": 0 }, "request_latency": { "mean": 0.0, "median": 0.0, "p90": 0.0, "p99": 0.0 }, "time_to_first_token": { "mean": 0.0, "median": 0.0, "p90": 0.0, "p99": 0.0 }, "time_per_output_token": { "mean": 0.0, "median": 0.0, "p90": 0.0, "p99": 0.0 }, "kv_cache_usage_percentage": { "mean": 0.0, "median": 0.0, "p90": 0.0, "p99": 0.0 }, "num_requests_swapped": { "mean": 0 }, "num_preemptions_total": { "mean": 0 }, "prefix_cache_hit_percent": { "mean": 0.0 } }, "failures": {} }Multi-stage Run Report:
Canceled during stage 0:
config.yaml
summary_prometheus_metrics.json
stage_0_lifecycle_metrics.json
summary_lifecycle_metrics.json
Canceled during stage 1:
config.yaml
summary_prometheus_metrics.json
stage_1_lifecycle_metrics.json
stage_0_lifecycle_metrics.json
summary_lifecycle_metrics.json