[Bug]:  Inadequate Validation and Error Handling of SamplingParams Constructor

The SamplingParams constructor does not validate inappropriate values, nor does it return informative error messages. This caused an issue when I changed top_p to top_k but left the value as a float between 0 and 1. The resulting error was misleading and time-consuming to debug.

The error message is very general, pointing to the generate method and IndexOutOfBoundsException, among other things, making it hard to trace back to the actual problem in a long script.

I propose adding validation to the SamplingParams constructor to ensure parameters are within acceptable ranges and provide more specific error messages. **I am willing to make a PR on this.**

To reproduce the error message, you can simply run the following 

```
from vllm import LLM, SamplingParams


sampling_params = SamplingParams(
    temperature=0.5,
    max_tokens=50,
    top_k=0.5
)

model = LLM("facebook/opt-125m")

outputs = model.generate("hello", sampling_params)
```

The error output is as follows:

```
Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
[rank0]: Traceback (most recent call last):
[rank0]:   File "directory/reproduce.py", line 12, in <module>
[rank0]:     outputs = model.generate("hello", sampling_params)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 895, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 330, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 611, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 919, in step
[rank0]:     output = self.model_executor.execute_model(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 110, in execute_model
[rank0]:     output = self.driver_worker.execute_model(execute_model_req)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 273, in execute_model
[rank0]:     output = self.model_runner.execute_model(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1384, in execute_model
[rank0]:     output: SamplerOutput = self.model.sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/opt.py", line 337, in sample
[rank0]:     next_tokens = self.sampler(logits, sampling_metadata)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 138, in forward
[rank0]:     sample_results, maybe_sampled_tokens_tensor = _sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 711, in _sample
[rank0]:     return _sample_with_torch(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 595, in _sample_with_torch
[rank0]:     sample_results = _random_sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 372, in _random_sample
[rank0]:     random_samples = random_samples.cpu()
[rank0]: RuntimeError: CUDA error: device-side assert triggered
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Inadequate Validation and Error Handling of SamplingParams Constructor #7203

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Inadequate Validation and Error Handling of SamplingParams Constructor #7203

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions