-
Notifications
You must be signed in to change notification settings - Fork 491
Open
Description
This is solved by me, but I would like to upload the debug report to help others.
🐛 Bug
Running python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb inside the supplied SLURM container fails with
Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found / WindowlessContext: Unable to create windowless context.
EGL enumerates zero devices even though nvidia-smi reports a healthy GPU.
Steps to Reproduce
- Use Container since no sudo permission in a GPU Cluster.
- Run
python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb. - Logs show repeated warnings: EGL fails to find a CUDA device; eglQueryDevicesEXT returns 0.
- LD_DEBUG=libs reveals mixed NVIDIA EGL versions (570.195.03) being loaded alongside the host driver 570.158.01.
Expected Behavior
Habitat-Sim should create a windowless EGL context and render frames without manual library surgery.
Additional Context
- Containers bundle newer NVIDIA EGL libraries (570.195.03) while the host driver is 570.158.01.
- Conda packages include Mesa’s
libEGL.so/libGLdispatch.so, which preload ahead of the system’s NVIDIA stack. - Resulting symbol lookups (
ErrorF,__malloc_hook,__free_hook, etc.) fail, causing zero EGL devices. - Workaround: remove the mismatched 570.195.03 binaries, disable Mesa EGL in the conda env, and set
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libc_malloc_debug.so.0:/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
export __EGL_VENDOR_LIBRARY_FILENAMES=/workspace/egl_config/10_nvidia.json
After restoring consistent driver/library versions, eglQueryDevicesEXT reports >0 devices and examples run normally.
System Info
- OS: Linux (cluster node, driver-managed; containerized environment)
- PyTorch install: prebundled with conda environment (conda)
- Build: using precompiled habitat-sim binaries from repo (no custom build)
- Python: 3.9
- GPU: NVIDIA H800 80 GB (single GPU exposed as /dev/nvidia1)
- CUDA: 12.8 (per nvidia-smi)
- Key dependencies: NumPy 1.26.4, SciPy <1.11 (warns about NumPy incompatibility); Mesa EGL libraries present in env by default
- Additional note: no sudo access; container image mixes host and container NVIDIA components.
manainszu
Metadata
Metadata
Assignees
Labels
No labels