bug: Unified‑memory + CUDA backend reports only GPU VRAM, causing false “model won’t run” status

**Version:** 0.6.9

## Describe the Bug
On Windows systems that use unified memory (CPU RAM + GPU VRAM shared in a single pool, e.g., AMD Ryzen with integrated RTX 4070), llama.cpp only reports the dedicated GPU memory (≈8 GB) when evaluating whether a model can be loaded. This results in the UI showing a red “model won’t run” indicator even though the model loads and runs quickly using the combined memory (≈32 GB).


## Steps to Reproduce
1. Launch Jan in a similar setup
2. Enable "System Memory Fallback" option in the NVIDIA Control Panel if not enabled
3. Select a model larger than Nvidia GPU VRAM

## Additional Context
GGML only reports the dedicated backend memory(for CUDA, it's Nvidia GPU VRAM) which is an expected behaviour. Should Jan also take in consideration of the overcommit memory when dedicated GPU memory is exhausted?

## Decision
TBD

## Screenshots / Logs



## Operating System
- [ ] MacOS
- [x] Windows
- [ ] Linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Unified‑memory + CUDA backend reports only GPU VRAM, causing false “model won’t run” status #6345

Describe the Bug

Steps to Reproduce

Additional Context

Decision

Screenshots / Logs

Operating System

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Unified‑memory + CUDA backend reports only GPU VRAM, causing false “model won’t run” status #6345

Description

Describe the Bug

Steps to Reproduce

Additional Context

Decision

Screenshots / Logs

Operating System

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

bug: Unified‑memory + CUDA backend reports only GPU VRAM, causing false “model won’t run” status #6345