Why does Oobabooga use so much ram for GGUF when all layers are loaded in GPU? #7424

AlyxMS · 2026-03-12T22:37:59Z

AlyxMS
Mar 12, 2026

As far as I know, both Oobabooga and KoboldCpp uses Llama.cpp to handle GGUF files.
But Oobabooga seems to be using a lot more RAM.

For the exact same gguf model(24b, Q4_K_L, 13.8GB in size), with context set to 16384 and all layers loaded in GPU:
For Oobabooga: My memory usage shoots up from 12GB to 26GB. Weirdly enough nothing in the task manager is showing high memory usage, but unloading the model immediately releases the used memory.
For KoboldCpp: Memory usage goes from 12GB to 13.5GB. And task manager shows KoboldCpp using ~1000MB of memory and its command prompt using an additional 300+MB.

In terms of VRAM, both are exactly the same. VRAM usages goes from 2GB to 18.5GB.

Just wondering, what's causing Oobabooga to use so much more RAM? Is it a configuration issue? For both tests, I have made no changes to the respective default config other than context size.

AlyxMS · 2026-03-13T06:20:51Z

AlyxMS
Mar 13, 2026
Author

Nevermind. I found what's causing it. It's mmap. KoboldAI doesn't use it by default. For Oobabooga I just need to select "no-mmap"

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Oobabooga use so much ram for GGUF when all layers are loaded in GPU? #7424

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does Oobabooga use so much ram for GGUF when all layers are loaded in GPU? #7424

Uh oh!

Uh oh!

AlyxMS Mar 12, 2026

Replies: 1 comment

Uh oh!

AlyxMS Mar 13, 2026 Author

AlyxMS
Mar 12, 2026

AlyxMS
Mar 13, 2026
Author