-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Checklist
- I searched related issues but found no solution.
- The bug persists in the latest version.
- Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- Please use English. Otherwise, it will be closed.
Describe the bug
Using the official script/documentation to deploy the Qwen-Image model for inference results in an error
[02-02 16:22:41] [DenoisingStage] Error during execution after 14353.3677 ms: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:41] [DenoisingStage] Error during execution after 14359.9784 ms: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 165, in execute_forward
result = self.pipeline.forward(req, self.server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 356, in forward
return self.executor.execute_with_profiling(self.stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
batch = self.execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 97, in execute
batch = self._execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 88, in _execute
batch = stage(batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 165, in execute_forward
result = self.pipeline.forward(req, self.server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 356, in forward
return self.executor.execute_with_profiling(self.stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
batch = self.execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 97, in execute
batch = self._execute(stages, batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 88, in _execute
batch = stage(batch, server_args)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in call
result = self.forward(batch, server_args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1023, in forward
noise_pred = self._predict_noise_with_cfg(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1268, in _predict_noise_with_cfg
noise_pred_cond = self._predict_noise(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py", line 1214, in _predict_noise
return current_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 982, in forward
encoder_hidden_states, hidden_states = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 763, in forward
attn_output = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py", line 560, in forward
img_query, img_key = apply_qk_norm(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/layers/layernorm.py", line 439, in apply_qk_norm
fused_inplace_qknorm(
File "/ossfs/workspace/sglang-main/python/sglang/jit_kernel/norm.py", line 77, in fused_inplace_qknorm
module.qknorm(q, k, q_weight, k_weight, eps)
File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.call
RuntimeError: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[02-02 16:22:42] Failed to generate output for prompt: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/utils/logging_utils.py", line 470, in log_generation_timer
yield timer
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/utils.py", line 216, in process_generation_batch
raise RuntimeError(
RuntimeError: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
[2026-02-02 16:22:42] INFO: 127.0.0.1:50870 - "POST /v1/images/generations HTTP/1.1" 500 Internal Server Error
[2026-02-02 16:22:42] ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 410, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1135, in call
await super().call(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 107, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 716, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 736, in app
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 290, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 115, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 101, in app
response = await f(request)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 355, in app
raw_response = await run_endpoint_function(
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 243, in run_endpoint_function
return await dependant.call(**values)
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py", line 135, in generations
save_file_path_list, result = await process_generation_batch(
File "/ossfs/workspace/sglang-main/python/sglang/multimodal_gen/runtime/entrypoints/openai/utils.py", line 216, in process_generation_batch
raise RuntimeError(
RuntimeError: Model generation returned no output. Error from scheduler: Error executing request 55a5e069-22c7-43bd-8860-9350ee3bb13e: Runtime check failed at :0: CUDA error: no kernel image is available for execution on the device
Reproduction
sglang serve --model-path /datacube_nas/noao_data/Qwen-Image --trust-remote-code --tp-size 2 --port 30010 --host 0.0.0.0 --diffusers-attention-backend native
Environment
python3 -m sglang.check_env
Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA A100-SXM4-80GB
GPU 0,1 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.61
CUDA Driver Version: 470.82.01
PyTorch: 2.9.1+cu128
sglang: 0.0.0.dev0
sgl_kernel: 0.3.21
flashinfer_python: 0.6.2
flashinfer_cubin: 0.6.2
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.13.3
fastapi: 0.128.0
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.33.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.12.5
python-multipart: 0.0.21
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X NV12 0-31,64-95 0
GPU1 NV12 X 32-63,96-127 1
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 655350