Skip to content

[Bug]: Inadequate Validation and Error Handling of SamplingParams Constructor #7203

@Atllkks10

Description

@Atllkks10

The SamplingParams constructor does not validate inappropriate values, nor does it return informative error messages. This caused an issue when I changed top_p to top_k but left the value as a float between 0 and 1. The resulting error was misleading and time-consuming to debug.

The error message is very general, pointing to the generate method and IndexOutOfBoundsException, among other things, making it hard to trace back to the actual problem in a long script.

I propose adding validation to the SamplingParams constructor to ensure parameters are within acceptable ranges and provide more specific error messages. I am willing to make a PR on this.

To reproduce the error message, you can simply run the following

from vllm import LLM, SamplingParams


sampling_params = SamplingParams(
    temperature=0.5,
    max_tokens=50,
    top_k=0.5
)

model = LLM("facebook/opt-125m")

outputs = model.generate("hello", sampling_params)

The error output is as follows:

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
[rank0]: Traceback (most recent call last):
[rank0]:   File "directory/reproduce.py", line 12, in <module>
[rank0]:     outputs = model.generate("hello", sampling_params)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/utils.py", line 895, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 330, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 611, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 919, in step
[rank0]:     output = self.model_executor.execute_model(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 110, in execute_model
[rank0]:     output = self.driver_worker.execute_model(execute_model_req)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 273, in execute_model
[rank0]:     output = self.model_runner.execute_model(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1384, in execute_model
[rank0]:     output: SamplerOutput = self.model.sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/opt.py", line 337, in sample
[rank0]:     next_tokens = self.sampler(logits, sampling_metadata)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 138, in forward
[rank0]:     sample_results, maybe_sampled_tokens_tensor = _sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 711, in _sample
[rank0]:     return _sample_with_torch(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 595, in _sample_with_torch
[rank0]:     sample_results = _random_sample(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 372, in _random_sample
[rank0]:     random_samples = random_samples.cpu()
[rank0]: RuntimeError: CUDA error: device-side assert triggered
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions