Hi! I have found a bug during the training of the caster model. It was caused by the torch.eye manipulation, simply it did not specify the device. When the Cuda is available, torch.eye will create the tensor on the CPU while the whole model is on the GPU.