Skip to content

bug: Unified‑memory + CUDA backend reports only GPU VRAM, causing false “model won’t run” status #6345

@qnixsynapse

Description

@qnixsynapse

Version: 0.6.9

Describe the Bug

On Windows systems that use unified memory (CPU RAM + GPU VRAM shared in a single pool, e.g., AMD Ryzen with integrated RTX 4070), llama.cpp only reports the dedicated GPU memory (≈8 GB) when evaluating whether a model can be loaded. This results in the UI showing a red “model won’t run” indicator even though the model loads and runs quickly using the combined memory (≈32 GB).

Steps to Reproduce

  1. Launch Jan in a similar setup
  2. Enable "System Memory Fallback" option in the NVIDIA Control Panel if not enabled
  3. Select a model larger than Nvidia GPU VRAM

Additional Context

GGML only reports the dedicated backend memory(for CUDA, it's Nvidia GPU VRAM) which is an expected behaviour. Should Jan also take in consideration of the overcommit memory when dedicated GPU memory is exhausted?

Decision

TBD

Screenshots / Logs

Operating System

  • MacOS
  • Windows
  • Linux

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions