Skip to content

vllm 0.6.4.post1 and greater breaks llm-load-test #80

@thameem-abbas

Description

@thameem-abbas

vllm with version > 0.6.3.post1 does not return usage stats in each response and this breaks llm-load-test. Also they appear to have moved to a cumulative usage metrics reporting (If I'm understanding it correctly, we might be able to simplify the logic in our streaming http request function).

Response in 0.6.3.post1:

{'id': 'cmpl-ccb120ae389e410ebfd5505f78b90b15', 'object': 'text_completion', 'created': 1737139401, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': '.', 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}], 'usage': {'prompt_tokens': 214, 'total_tokens': 265, 'completion_tokens': 51}}

Response in 0.6.4.post1:

{'id': 'cmpl-8111903b974a4d5f8c02f9687fc3c282', 'object': 'text_completion', 'created': 1737140221, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': 'Yes', 'logprobs': None, 'finish_reason': None, 'stop_reason': None}], 'usage': None}

Error seen:

sh-5.1# python3 load_test.py -c /mnt/results/config.yaml 
2025-01-17 18:57:00,751 INFO     root MainProcess dataset config: {'file': 'datasets/openorca_large_subset_011.jsonl', 'max_queries': 1000, 'min_input_tokens': 0, 'max_input_tokens': 1024, 'min_output_tokens': 0, 'max_output_tokens': 1024, 'max_sequence_tokens': 2048}
2025-01-17 18:57:00,751 INFO     root MainProcess load_options config: {'type': 'constant', 'concurrency': 1, 'duration': 20}
2025-01-17 18:57:00,751 INFO     root MainProcess Initializing dataset with {'self': <dataset.Dataset object at 0x7ff8c3d16130>, 'file': 'datasets/openorca_large_subset_011.jsonl', 'max_queries': 1000, 'min_input_tokens': 0, 'max_input_tokens': 1024, 'min_output_tokens': 0, 'max_output_tokens': 1024, 'max_sequence_tokens': 2048, 'custom_prompt_format': None}
2025-01-17 18:57:00,838 INFO     root MainProcess Starting <SpawnProcess name='SpawnProcess-1' parent=403 initial>
2025-01-17 18:57:00,842 INFO     root MainProcess Test from main process
2025-01-17 18:57:01,334 INFO     user SpawnProcess-1 User 0 making request
{'id': 'cmpl-8111903b974a4d5f8c02f9687fc3c282', 'object': 'text_completion', 'created': 1737140221, 'model': 'opt-125m', 'choices': [{'index': 0, 'text': 'Yes', 'logprobs': None, 'finish_reason': None, 'stop_reason': None}], 'usage': None}
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/src/llm-load-test/user.py", line 68, in run_user_process
    result = self.make_request(test_end_time)
  File "/src/llm-load-test/user.py", line 48, in make_request
    result = self.plugin.request_func(query, self.user_id, test_end_time)
  File "/src/llm-load-test/plugins/openai_plugin.py", line 321, in streaming_request_http
    current_usage = deepget(message, "usage", "completion_tokens")
  File "/src/llm-load-test/plugins/openai_plugin.py", line 44, in deepget
    current = current[pos]
TypeError: 'NoneType' object is not subscriptable
2025-01-17 18:57:20,871 INFO     root MainProcess Timer ended, stopping processes
2025-01-17 18:57:20,871 ERROR    root MainProcess Unexpected exception in main process
Traceback (most recent call last):
  File "/src/llm-load-test/load_test.py", line 155, in main
    results_list = gather_results(results_pipes)
  File "/src/llm-load-test/load_test.py", line 55, in gather_results
    user_results = results_pipe.recv()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 254, in recv
    buf = self._recv_bytes()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 418, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 387, in _recv
    raise EOFError
EOFError
2025-01-17 18:57:20,872 INFO     root MainProcess User processes terminated succesfully

The fix that seems to have broken llm-load-test : vllm-project/vllm#9357

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions