Version: 0.6.9
Describe the Bug
On Windows systems that use unified memory (CPU RAM + GPU VRAM shared in a single pool, e.g., AMD Ryzen with integrated RTX 4070), llama.cpp only reports the dedicated GPU memory (≈8 GB) when evaluating whether a model can be loaded. This results in the UI showing a red “model won’t run” indicator even though the model loads and runs quickly using the combined memory (≈32 GB).
Steps to Reproduce
- Launch Jan in a similar setup
- Enable "System Memory Fallback" option in the NVIDIA Control Panel if not enabled
- Select a model larger than Nvidia GPU VRAM
Additional Context
GGML only reports the dedicated backend memory(for CUDA, it's Nvidia GPU VRAM) which is an expected behaviour. Should Jan also take in consideration of the overcommit memory when dedicated GPU memory is exhausted?
Decision
TBD
Screenshots / Logs
Operating System