Skip to content

Commit c9840ed

Browse files
kaixihshreyankg
authored andcommitted
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (vllm-project#13632)
1 parent 1830a32 commit c9840ed

File tree

1 file changed

+1
-4
lines changed

1 file changed

+1
-4
lines changed

csrc/quantization/fp4/nvfp4_quant_kernels.cu

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -348,10 +348,7 @@ void scaled_fp4_quant_sm100a(torch::Tensor const& output,
348348
auto sf_out = static_cast<int32_t*>(output_sf.data_ptr());
349349
auto output_ptr = static_cast<int64_t*>(output.data_ptr());
350350
at::cuda::CUDAGuard device_guard{(char)input.get_device()};
351-
auto stream = at::cuda::getStreamFromPool(false, input.get_device());
352-
if (stream == nullptr) {
353-
std::cerr << "Warning: Null CUDA stream" << std::endl;
354-
}
351+
auto stream = at::cuda::getCurrentCUDAStream(input.get_device());
355352

356353
// We don't support e8m0 scales at this moment.
357354
bool useUE8M0 = false;

0 commit comments

Comments
 (0)