vllm 0.6.4.post1 and greater breaks llm-load-test

vllm with version > 0.6.3.post1 does not return usage stats in each response and this breaks llm-load-test. Also they appear to have moved to a cumulative usage metrics reporting (If I'm understanding it correctly, we might be able to simplify the logic in our streaming http request function). 

Response in 0.6.3.post1:

`{'id': 'cmpl-ccb120ae389e410ebfd5505f78b90b15', 'object': 'text_completion', 'created': 1737139401, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': '.', 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}], 'usage': {'prompt_tokens': 214, 'total_tokens': 265, 'completion_tokens': 51}}`

Response in 0.6.4.post1:

`{'id': 'cmpl-8111903b974a4d5f8c02f9687fc3c282', 'object': 'text_completion', 'created': 1737140221, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': 'Yes', 'logprobs': None, 'finish_reason': None, 'stop_reason': None}], 'usage': None}`

Error seen: 

```
sh-5.1# python3 load_test.py -c /mnt/results/config.yaml 
2025-01-17 18:57:00,751 INFO     root MainProcess dataset config: {'file': 'datasets/openorca_large_subset_011.jsonl', 'max_queries': 1000, 'min_input_tokens': 0, 'max_input_tokens': 1024, 'min_output_tokens': 0, 'max_output_tokens': 1024, 'max_sequence_tokens': 2048}
2025-01-17 18:57:00,751 INFO     root MainProcess load_options config: {'type': 'constant', 'concurrency': 1, 'duration': 20}
2025-01-17 18:57:00,751 INFO     root MainProcess Initializing dataset with {'self': <dataset.Dataset object at 0x7ff8c3d16130>, 'file': 'datasets/openorca_large_subset_011.jsonl', 'max_queries': 1000, 'min_input_tokens': 0, 'max_input_tokens': 1024, 'min_output_tokens': 0, 'max_output_tokens': 1024, 'max_sequence_tokens': 2048, 'custom_prompt_format': None}
2025-01-17 18:57:00,838 INFO     root MainProcess Starting <SpawnProcess name='SpawnProcess-1' parent=403 initial>
2025-01-17 18:57:00,842 INFO     root MainProcess Test from main process
2025-01-17 18:57:01,334 INFO     user SpawnProcess-1 User 0 making request
{'id': 'cmpl-8111903b974a4d5f8c02f9687fc3c282', 'object': 'text_completion', 'created': 1737140221, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': 'Yes', 'logprobs': None, 'finish_reason': None, 'stop_reason': None}], 'usage': None}
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/src/llm-load-test/user.py", line 68, in run_user_process
    result = self.make_request(test_end_time)
  File "/src/llm-load-test/user.py", line 48, in make_request
    result = self.plugin.request_func(query, self.user_id, test_end_time)
  File "/src/llm-load-test/plugins/openai_plugin.py", line 321, in streaming_request_http
    current_usage = deepget(message, "usage", "completion_tokens")
  File "/src/llm-load-test/plugins/openai_plugin.py", line 44, in deepget
    current = current[pos]
TypeError: 'NoneType' object is not subscriptable
2025-01-17 18:57:20,871 INFO     root MainProcess Timer ended, stopping processes
2025-01-17 18:57:20,871 ERROR    root MainProcess Unexpected exception in main process
Traceback (most recent call last):
  File "/src/llm-load-test/load_test.py", line 155, in main
    results_list = gather_results(results_pipes)
  File "/src/llm-load-test/load_test.py", line 55, in gather_results
    user_results = results_pipe.recv()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 254, in recv
    buf = self._recv_bytes()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 418, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 387, in _recv
    raise EOFError
EOFError
2025-01-17 18:57:20,872 INFO     root MainProcess User processes terminated succesfully
```


The fix that seems to have broken llm-load-test : https://github.com/vllm-project/vllm/pull/9357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm 0.6.4.post1 and greater breaks llm-load-test #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm 0.6.4.post1 and greater breaks llm-load-test #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions