Skip to content

[Question/Optimization] Inquiry regarding the timing of copy_data_to_device and CPU assertion #450

@pbhfcycssjlmm

Description

@pbhfcycssjlmm

First of all, huge respect for your outstanding work!

I have a question regarding the data processing pipeline in vggt/training/trainer.py.

Currently, the code executes batch = copy_data_to_device(batch, self.device, non_blocking=True) after _process_batch. I also noticed that inside normalize_camera_extrinsics_and_points_batch (which is called by _process_batch), there is an explicit assertion:

assert device == torch.device("cpu")

I am currently working on training acceleration on NPU devices. I experimented by moving copy_data_to_device to before _process_batch (and removing the CPU assertion).

The results showed that the model trains normally, and the loss precision aligns perfectly with the baseline. More importantly, the iteration speed has improved due to the acceleration.

I am curious about the intention behind the original design. Is there a specific reason why this data processing step was strictly restricted to the CPU?

Thanks for your time and insights

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions