When I run run_pdm_score directly in the main process, everything works fine. However, when using worker_map (which uses Ray for distributed execution), the worker processes fail during model initialization with the following error:
ValueError: FlashAttention2 has been toggled on, but it cannot be used due to the following error:
Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device.
so how can I use CUDA in worker map?