So far, I've seen that there are many flags that check for torch with cuda, and many cuda specific calls.
I've commented the checks out and converted the devices to be used with "cpu" but it is very slow.
Is there a way to make this work with MPS?
I imagine we could go in and modify the code, and everything to float32 instead of float64?