[Feature]: improve sleep mode to not break torch.cuda memory counters

### 🚀 The feature, motivation and pitch

Currently when vllm is used in conjunction with other gpu programs, e.g. RL (verl+vllm) and the sleep mode is used, we end up with very bogus `torch.cuda` memory counters.  

The `torch.cuda` memory reporting is all broken in that situation, since vllm somehow frees up all the kv-cache and weights when put to "sleep" (because we need the same gpu's mem to be free to do a training step) but torch isn't the wiser the freeing happened - the same gpus are shared between inference and training - each loading and releasing all memory it uses - so only one of them used at a time.

So when vllm did its unloading `torch.cuda` still reports `memory_allocated()` from vllm's run, even though it has been actually freed, which makes it quite difficult to debug memory-related problems.

The other weird related thing is that `torch.cuda.memory_reserved` and `torch.cuda.max_memory_reserved` report: 198.68 GB and 199 GB on H200, so there are only 140GB! How could it possibly report more than the physical size of the memory? (and in this particular use case vllm's memory usage was about 60GB so the diff with 200-60=140GB checks out)

So the workaround proposed here https://github.com/vllm-project/vllm/pull/11743#issuecomment-2754338119 is to use:  mem_get_info() and manually calculate the Used memory:
```
    mem_free, mem_total = get_torch_device().mem_get_info()
    mem_used = mem_total - mem_free
```

but all other counters are still wrong, and getting just Used memory is insufficient when dealing with memory issues.

Is it possible to fix the sleep mode to correctly tell `torch.cuda` that tensors used by vllm have been freed?

Thank you.

cc: @youkaichao, who created the sleep PR https://github.com/vllm-project/vllm/pull/11743


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: improve sleep mode to not break torch.cuda memory counters #33625

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: improve sleep mode to not break torch.cuda memory counters #33625

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions