The SamplingParams constructor does not validate inappropriate values, nor does it return informative error messages. This caused an issue when I changed top_p to top_k but left the value as a float between 0 and 1. The resulting error was misleading and time-consuming to debug.
The error message is very general, pointing to the generate method and IndexOutOfBoundsException, among other things, making it hard to trace back to the actual problem in a long script.
I propose adding validation to the SamplingParams constructor to ensure parameters are within acceptable ranges and provide more specific error messages. I am willing to make a PR on this.
from vllm import LLM, SamplingParams
sampling_params = SamplingParams(
temperature=0.5,
max_tokens=50,
top_k=0.5
)
model = LLM("facebook/opt-125m")
outputs = model.generate("hello", sampling_params)
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
[rank0]: Traceback (most recent call last):
[rank0]: File "directory/reproduce.py", line 12, in <module>
[rank0]: outputs = model.generate("hello", sampling_params)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 895, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 330, in generate
[rank0]: outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 611, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 919, in step
[rank0]: output = self.model_executor.execute_model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 110, in execute_model
[rank0]: output = self.driver_worker.execute_model(execute_model_req)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 273, in execute_model
[rank0]: output = self.model_runner.execute_model(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1384, in execute_model
[rank0]: output: SamplerOutput = self.model.sample(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/opt.py", line 337, in sample
[rank0]: next_tokens = self.sampler(logits, sampling_metadata)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 138, in forward
[rank0]: sample_results, maybe_sampled_tokens_tensor = _sample(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 711, in _sample
[rank0]: return _sample_with_torch(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 595, in _sample_with_torch
[rank0]: sample_results = _random_sample(
[rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 372, in _random_sample
[rank0]: random_samples = random_samples.cpu()
[rank0]: RuntimeError: CUDA error: device-side assert triggered
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
The SamplingParams constructor does not validate inappropriate values, nor does it return informative error messages. This caused an issue when I changed top_p to top_k but left the value as a float between 0 and 1. The resulting error was misleading and time-consuming to debug.
The error message is very general, pointing to the generate method and IndexOutOfBoundsException, among other things, making it hard to trace back to the actual problem in a long script.
I propose adding validation to the SamplingParams constructor to ensure parameters are within acceptable ranges and provide more specific error messages. I am willing to make a PR on this.
To reproduce the error message, you can simply run the following
The error output is as follows: