[Question/Optimization] Inquiry regarding the timing of copy_data_to_device and CPU assertion

First of all, huge respect for your outstanding work!

I have a question regarding the data processing pipeline in `vggt/training/trainer.py`.

Currently, the code executes `batch = copy_data_to_device(batch, self.device, non_blocking=True)` **after** `_process_batch`. I also noticed that inside `normalize_camera_extrinsics_and_points_batch` (which is called by `_process_batch`), there is an explicit assertion:

```python
assert device == torch.device("cpu")
```

I am currently working on training acceleration on NPU devices. I experimented by moving `copy_data_to_device` to **before** `_process_batch` (and removing the CPU assertion).

The results showed that the model trains normally, and the loss precision aligns perfectly with the baseline. More importantly, the iteration speed has improved due to the acceleration.

I am curious about the intention behind the original design. Is there a specific reason why this data processing step was strictly restricted to the CPU?

Thanks for your time and insights


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question/Optimization] Inquiry regarding the timing of copy_data_to_device and CPU assertion #450

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question/Optimization] Inquiry regarding the timing of copy_data_to_device and CPU assertion #450

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions