When training with vllm_grpo_trainer_modified.py, the memory (system memory, not cuda memory) keeps growing.
It leads to OOM in the middle of training (640G memory machine).
I tried to locate the leak with tracemalloc, it shows that transformers/models/qwen2_vl/image_processing_qwen2_vl.py:455 growing fast during training.
image_processing_qwen2_vl.py:455: pixel_values = np.array(pixel_values)
Seems the pixel_values was never relesed, I tried to namually del related variables like in the trainer, but did not work.
Any idea on this issue?