First of all, huge respect for your outstanding work!
I have a question regarding the data processing pipeline in vggt/training/trainer.py.
Currently, the code executes batch = copy_data_to_device(batch, self.device, non_blocking=True) after _process_batch. I also noticed that inside normalize_camera_extrinsics_and_points_batch (which is called by _process_batch), there is an explicit assertion:
assert device == torch.device("cpu")
I am currently working on training acceleration on NPU devices. I experimented by moving copy_data_to_device to before _process_batch (and removing the CPU assertion).
The results showed that the model trains normally, and the loss precision aligns perfectly with the baseline. More importantly, the iteration speed has improved due to the acceleration.
I am curious about the intention behind the original design. Is there a specific reason why this data processing step was strictly restricted to the CPU?
Thanks for your time and insights