ci: set LD_LIBRARY_PATH in Docker images for correct cuBLAS detection (#2468)

bkryu · web-flow · commit 9b5901eb0076 · 2026-02-02T23:41:46.000-08:00
## 📌 Description Summary * Add `LD_LIBRARY_PATH` to Docker images to ensure pip-installed `nvidia-cublas` takes precedence over system libraries * Fixes issues where incorrect cuBLAS versions could be loaded at runtime Example of what happens without prepending the path to `LD_LIBRARY_PATH` in our cu130 containers: ``` $ docker run --gpus all -it flashinfer/flashinfer-ci-cu130:20260131-a52eff1 Unable to find image 'flashinfer/flashinfer-ci-cu130:20260131-a52eff1' locally 20260131-a52eff1: Pulling from flashinfer/flashinfer-ci-cu130 Digest: sha256:582aeb35289cf804735a31727abe8ff37ae722fe6c7bd7fb8ddf50654429ff7a Status: Downloaded newer image for flashinfer/flashinfer-ci-cu130:20260131-a52eff1 ========== == CUDA == ========== CUDA Version 13.0.1 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. (py312) root@fdac9b9cd61e:/workspace# python -c "import torch; print(torch.matmul(torch.randn(128,128,device='cuda'), torch.randn(128,128,device='cuda')))" Traceback (most recent call last): File "<string>", line 1, in <module> RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` (py312) root@fdac9b9cd61e:/workspace# export LD_LIBRARY_PATH=/opt/conda/envs/py312/lib/python3.12/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH (py312) root@fdac9b9cd61e:/workspace# python -c "import torch; print(torch.matmul(torch.randn(128,128,device='cuda'), torch.randn(128,128,device='cuda')))" tensor([[ 14.9044, 14.3420, 26.0861, ..., -10.4334, -4.5352, 4.2331], [ 1.9701, 13.6111, 1.0954, ..., 3.0715, -2.9266, 7.8847], [ 6.5089, -7.4811, -12.6226, ..., -5.3695, -4.4557, -22.4567], ..., [-12.0462, -2.0045, 15.7295, ..., -4.5688, 22.5680, -11.9852], [ -0.4228, 10.2761, 0.1951, ..., 16.5192, 12.7168, 0.9931], [ -0.2800, -5.7174, -2.9644, ..., 1.8484, -10.0042, -7.7290]], device='cuda:0') ```  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Chores** * Updated Docker build configurations for CUDA 12.6, 12.8, 12.9, and 13.0 to set runtime library precedence so conda-installed NVIDIA cuBLAS libraries are favored over system libraries.
diff --git a/docker/Dockerfile.cu126 b/docker/Dockerfile.cu126
@@ -19,6 +19,9 @@ RUN echo "source activate py312" >> ~/.bashrc
 ENV PATH="/opt/conda/bin:$PATH"
 ENV PATH="/opt/conda/envs/py312/bin:$PATH"
 
+# Ensure pip-installed nvidia-cublas takes precedence over system libraries
+ENV LD_LIBRARY_PATH="/opt/conda/envs/py312/lib/python3.12/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH"
+
 # Install torch and other python packages
 COPY requirements.txt /install/requirements.txt
 COPY docker/install/install_python_packages.sh /install/install_python_packages.sh
diff --git a/docker/Dockerfile.cu128 b/docker/Dockerfile.cu128
@@ -19,6 +19,9 @@ RUN echo "source activate py312" >> ~/.bashrc
 ENV PATH="/opt/conda/bin:$PATH"
 ENV PATH="/opt/conda/envs/py312/bin:$PATH"
 
+# Ensure pip-installed nvidia-cublas takes precedence over system libraries
+ENV LD_LIBRARY_PATH="/opt/conda/envs/py312/lib/python3.12/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH"
+
 # Install torch and other python packages
 COPY requirements.txt /install/requirements.txt
 COPY docker/install/install_python_packages.sh /install/install_python_packages.sh
diff --git a/docker/Dockerfile.cu129 b/docker/Dockerfile.cu129
@@ -19,6 +19,9 @@ RUN echo "source activate py312" >> ~/.bashrc
 ENV PATH="/opt/conda/bin:$PATH"
 ENV PATH="/opt/conda/envs/py312/bin:$PATH"
 
+# Ensure pip-installed nvidia-cublas takes precedence over system libraries
+ENV LD_LIBRARY_PATH="/opt/conda/envs/py312/lib/python3.12/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH"
+
 # Triton
 ENV TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"
 
diff --git a/docker/Dockerfile.cu130 b/docker/Dockerfile.cu130
@@ -19,6 +19,9 @@ RUN echo "source activate py312" >> ~/.bashrc
 ENV PATH="/opt/conda/bin:$PATH"
 ENV PATH="/opt/conda/envs/py312/bin:$PATH"
 
+# Set LD_LIBRARY_PATH to ensure pip-installed nvidia-cublas takes precedence over system libraries
+ENV LD_LIBRARY_PATH="/opt/conda/envs/py312/lib/python3.12/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH"
+
 # Triton
 ENV TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"