Skip to content

[Bug] CUDA error: no kernel image is available for execution on the device #18108

@xiyuanyuan0506

Description

@xiyuanyuan0506

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Using the official script/documentation to deploy the Qwen-Image model for inference results in an error

[02-02 16:22:41] [DenoisingStage] Error during execution after 14353.3677 ms: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:41] [DenoisingStage] Error during execution after 14359.9784 ms: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 165, in execute_forward
result = self.pipeline.forward(req, self.server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 356, in forward
return self.executor.execute_with_profiling(self.stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
batch = self.execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 97, in execute
batch = self._execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 88, in _execute
batch = stage(batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 165, in execute_forward
result = self.pipeline.forward(req, self.server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 356, in forward
return self.executor.execute_with_profiling(self.stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
batch = self.execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 97, in execute
batch = self._execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 88, in _execute
batch = stage(batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Failed to generate output for prompt: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/utils/logging_utils.py", line 470, in log_generation_timer
yield timer
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/utils.py", line 216, in process_generation_batch
raise RuntimeError(
RuntimeError: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[2026-02-02 16:22:42] INFO: 127.0.0.1:50870 - "POST /v1/images/generations HTTP/1.1" 500 Internal Server Error
[2026-02-02 16:22:42] ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 410, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1135, in call
await super().call(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 107, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 716, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 736, in app
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 290, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 115, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 101, in app
response = await f(request)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 355, in app
raw_response = await run_endpoint_function(
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 243, in run_endpoint_function
return await dependant.call(**values)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py", line 135, in generations
save_file_path_list, result = await process_generation_batch(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/utils.py", line 216, in process_generation_batch
raise RuntimeError(
RuntimeError: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device

Reproduction

sglang serve --model-path /datacube_nas/noao_data/Qwen-Image --trust-remote-code --tp-size 2 --port 30010 --host 0.0.0.0 --diffusers-attention-backend native

Environment

python3 -m sglang.check_env
Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA A100-SXM4-80GB
GPU 0,1 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.61
CUDA Driver Version: 470.82.01
PyTorch: 2.9.1+cu128
sglang: 0.0.0.dev0
sgl_kernel: 0.3.21
flashinfer_python: 0.6.2
flashinfer_cubin: 0.6.2
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.13.3
fastapi: 0.128.0
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.33.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.12.5
python-multipart: 0.0.21
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X NV12 0-31,64-95 0
GPU1 NV12 X 32-63,96-127 1

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 655350

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions